Related
mustache templating processor // Bodacious Blog
Code for mustache-transform
http://github.com/mullikine/mustache-transform

Summary

I would like to place lengthy strings into mermaid diagram nodes, but I don’t want to worry about escaping characters or formatting the text.

The problem I’m trying to solve is that a simple string substitution is not enough for many formats, including mermaid files.

There may be some additional transformation that needs to be applied to the string before it goes into the file.

I design a simple templating pipeline for automatically escaping text and inserting it into mermaid diagrams and also for working with mermaid graphs within emacs’ babel.

I have made it trivial for me to develop large flowcharts with large amounts of text and special characters.

I also try to make the process generic enough to be adaptable and useful for templating other file formats.

YAML turns out to be an excellent format for in storing configuration data and program input due to its heredoc syntax.

Code

template

This script is a makeshift template preprocessor, which I use to do the substitution initially.

I did it this way first just to get an idea of how I was going to approach the task. But the final code will use mustache, a much more robust templating tool.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#!/bin/bash
export TTY

stdin_exists() {
    ! [ -t 0 ]
}

if stdin_exists; then
    fp="$(tf txt)"
else
    echo "requires stdin" 1>&2
    exit 1
fi

while [ $# -gt 0 ]; do opt="$1"; case "$opt" in
    -[a-z0-9]*) {
        varname="$(p "$opt" | mcut -d- -f2)"
        contents="$2"
        shift
        shift
        varname="$(p "$varname" | sed 's/\s\+//g')"                                        # no spaces
        varname="$(p "$varname" | sed -e 's/\(.*\)/\L\1/')"                                # lowercase
        varname="$(p "$varname" | fuzzify-regex -s)"                                       # let params match fields with spaces

        cat "$fp" | ptw replace-substring -i -m "<$varname>" -r "$contents" | sponge "$fp" # perform variable replacement
        cat "$fp" | sed 's/\(\b1 [a-zA-Z]\+\)s\b/\1/g' | sponge "$fp"                      # fix singular plural
        cat "$fp" | sed 's/""/"/g' | sponge "$fp"                                          # fix CSV double quote
    }
    ;;

    *) break;
esac; done

cat "$fp"

Using makeshift template preprocessor

Demonstration

This mermaid diagram template needs <parsemetadata> to be replaced by a string which is syntactically correct.

The string must be a single lined double quoted string and where newlines are to appear, the html tag <br /> must appear.

I don’t want to remember those rules. I just want to automate that templating process.

1
2
3
4
Parse metadata.csv and for each row, extract
the title, abstract, date published etc.
(contained directly in the csv) and bodytext of
the article (from the article json file)

1
sed -z 's=\n=<br />=g'
Parse metadata.csv and for each row, extract<br />the title, abstract, date published etc.<br />(contained directly in the csv) and bodytext of<br />the article (from the article json file)

graph TD
    A[<parsemetadata>] --> B{Is it?};
    B -- Yes --> C[OK];
    C --> D[Rethink];
    D --> B;
    B -- No ----> E[End];

aqf-nice just double quotes the string.

1
template -parsemetadata "$(aqf-nice "Parse metadata.csv and for each row, extract<br />the title, abstract, date published etc.<br />(contained directly in the csv) and bodytext of<br />the article (from the article json file)")"
graph TD
    A["Parse metadata.csv and for each row, extract<br />the title, abstract, date published etc.<br />(contained directly in the csv) and bodytext of<br />the article (from the article json file)"] --> B{Is it?};
    B -- Yes --> C[OK];
    C --> D[Rethink];
    D --> B;
    B -- No ----> E[End];
graph TD
  A["Parse metadata.csv and for each row, extract<br />the title, abstract, date published etc.<br />(contained directly in the csv) and bodytext of<br />the article (from the article json file)"] --> B{Is it?};
  B -- Yes --> C[OK];
  C --> D[Rethink];
  D --> B;
  B -- No ----> E[End];

Using mustache

Mustache needs a data file.

1
2
3
4
5
6
7
8
9
parsemetadata: |-
  Parse metadata.csv and for each row, extract
  the title, abstract, date published etc.
  (contained directly in the csv) and bodytext of
  the article (from the article json file)  
applytextprocessing: |-
  Apply text processing (see Text Processing
  for details) and split text block into
  sentences using nltk tokenizer  

mermaiddata-raw.yaml

{{{ in mustache means that the variable text will not be transformed into html entities.

If I use {{ then text such as <br / will come out as entities instead.

graph TD
    A[{{{parsemetadata}}}] --> B{Is it?};
    B[{{{applytextprocessing}}}] -- Yes --> C[OK];
    C --> D[Rethink];
    D --> B;
    B -- No ----> E[End];

<irparse.mermaid>

1
mustache mermaiddata.yaml irparse.mermaid
graph TD
  A[Parse metadata.csv and for each row, extract
the title, abstract, date published etc.
(contained directly in the csv) and bodytext of
the article (from the article json file)] --> B{Is it?};
  B -- Yes --> C[OK];
  C --> D[Rethink];
  D --> B;
  B -- No ----> E[End];

But how can I then include the sed command in the pipeline?

I need something new. I need to define a new format. This should go alongside the data file, but specify the transformations.

List all the keys, for each key pipe through associated filter. Reconstruct the yaml.

1
2
3
4
parsemetadata: |-
    sed -z 's=\n=<br />=g' | q -f
applytextprocessing: |-
    sed -z 's=\n=<br />=g' | q -f

transformations.yaml
1
cat mermaiddata-raw.yaml | yq -r ". | keys[]"
applytextprocessing
parsemetadata

mustache-transform

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#!/bin/bash
export TTY

( hs "$(basename "$0")" "$@" "#" "<==" "$(ps -o comm= $PPID)" 0</dev/null ) &>/dev/null

data="$1"
test -f "$data" || exit 1
trans="$2"
test -f "$trans" || exit 1

# lein uberjar
java -jar $MYGIT/mullikine/mustache-transform/target/uberjar/mustache-transform-0.1.0-SNAPSHOT-standalone.jar "$@" | yq . | pavs

mullikine/mustache-transform

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
(ns mustache-transform.core
  (:gen-class)
  (:require [clj-yaml.core :as yaml]
            [clojure.pprint :as pp]
            [clojure.set :as set]))

(use '[clojure.java.shell :only [sh]])

(defn transform-map [m fm]
  (into {}
        (map
         (fn [[k v]]
           [k (:out (sh "sh" "-c" v :in (k m)))])
         fm)))

(defn -main
  "I don't do a whole lot ... yet."
  ([data-fp transform-fp & args]
   (let [data (yaml/parse-string (slurp data-fp))
         transform (yaml/parse-string (slurp transform-fp))]

     ;; running from uberjar appears to need println, not print
     (println (yaml/generate-string newdata  :dumper-options {:flow-style :block}))
     (shutdown-agents))))
1
mustache-transform mermaiddata-raw.yaml transformations.yaml
parsemetadata: Parse metadata.csv and for each row, extract<br />the title, abstract, date published etc.<br />(contained directly in the csv) and bodytext of<br />the article (from the article json file)
applytextprocessing: Apply text processing (see Text Processing<br />for details) and split text block into<br />sentences using nltk tokenizer
1
mustache mermaiddata-raw.yaml irparse.mermaid
graph TD
    A[Parse metadata.csv and for each row, extract
the title, abstract, date published etc.
(contained directly in the csv) and bodytext of
the article (from the article json file)] --> B{Is it?};
    B[Apply text processing (see Text Processing
for details) and split text block into
sentences using nltk tokenizer] -- Yes --> C[OK];
    C --> D[Rethink];
    D --> B;
    B -- No ----> E[End];
1
mustache =(mustache-transform mermaiddata-raw.yaml transformations.yaml) irparse.mermaid
graph TD
    A["Parse metadata.csv and for each row, extract<br />the title, abstract, date published etc.<br />(contained directly in the csv) and bodytext of<br />the article (from the article json file)"] --> B{Is it?};
    B["Apply text processing (see Text Processing<br />for details) and split text block into<br />sentences using nltk tokenizer"] -- Yes --> C[OK];
    C --> D[Rethink];
    D --> B;
    B -- No ----> E[End];
graph TD
    A["Parse metadata.csv and for each row, extract<br />the title, abstract, date published etc.<br />(contained directly in the csv) and bodytext of<br />the article (from the article json file)"] --> B{Is it?};
    B["Apply text processing (see Text Processing<br />for details) and split text block into<br />sentences using nltk tokenizer"] -- Yes --> C[OK];
    C --> D[Rethink];
    D --> B;
    B -- No ----> E[End];