Build system for TikZ and pgfplots

Problem

LaTeX compilation is quite slow. Especially if some reference changes and the document has to be compiled three times in order to get all the page numbers sorted out. This is bearable as the output of LaTeX is way better than any other text processing system that I have tried.

Lately I have made a lot of figures with TikZ and my plots with pgfplots. Their output looks just like the document itself: you have the same text, a matching line width and therefore a similar gray value. The figures just blend in perfectly.

This comes at a price, compilation of complicated pgfplots figures takes well over 10 seconds each. If you have three runs, you have to wait half a minute for each plot in your document. For a moderately complicated document like a lab report, we easily have over ten plots. This means a lot of waiting time. In most cases one only needs one compilation run, but that still wastes a lot of time. Since the memory in the pdflatex engine is limited, it cannot build arbitrary complex plots, either.

The time is wasted as the plot itself did not change at all, it is just not cached when you have a plain \begin{tikzpicture} directive in your document. TikZ comes with the feature to externalize. It works by extracting all the tikzpicture environments into separate files. Also a Makefile is generated for you in that directory. You run make or even make -j $(nproc) and can compile the figures in parallel. On the next run of pdflatex, the compiled PDF images are included in the main document.

This is good, it does save time, mostly. On the very first run, you will have no figures. They are all replaced by the text [[ discarded due to externalize ]] (or similar). This means that the height used in the document is less than that of the actual figure. LaTeX will position the figure floats then differently, screwing up all the references. The tool latexmk will then run three times to get the references sorted out. Then you run make for the figures and latexmk again. In total that were six runs of pdflatex.

If you change a figure, the first run will export the new LaTeX code and discard the figure. You have to run make and then run pdflatex again. This back-and-forth with the figures screws up the change monitoring of latexmk, one ends up compiling four to six times for every little change in the figures.

To me, this is a waste of my time. It actually wastes so much time that coming up with a better system amortizes this time.

Idea

While I was in Jülich for the JSC GSP, we made our reports and presentations with a lot of snippets in directories. So there was a Figures directory, another Plots one and Listings. In say Plots/scaling.tex one then had the following:

\begin{tikzpicture}\end{tikzpicture}

In the main document, one would do this:

\begin{figure}
    \centering
    \input{Figures/scaling}
    \caption{}
    \label{}
\end{figure}

The TikZ externalize feature would then externalize the code snippet again to build it with a generated Makefile.

So why use \input and externalize at the same time? Why not compile the figures directly into PDFs and include those PDFs without any magic into the main document? The dependencies can also modeled in the main Makefile of the project.

Implementation

First I needed a little helper to wrap such a snippet into a full document. This is a Python program called tikzpicture_wrap.py. It uses the Jinja2 template engine to just wrap the snippet with \documentclass{scrartcl} and the like. I use the same documentclass as in the main document in order to get exact same fonts and identical \linewidth. The standalone document class automatically crops the resulting image. However, it does not allow me to generate figures for beamer and scrartcl in an easy way. The wrapped LaTeX file is then put into build/pages.

The wrapped document is then compiled with pdflatex or lualatex and stored into build/pages. I use pdfcrop to crop the resulting PDF tightly. One has to use \pagestyle{empty} to remove the page number. The cropped image is just put into build.

After all images are compiled, I can compile the main document. There I use \includegraphics to insert the pre-compiled PDF into the document.

The main Makefile consists of the following blocks. First we need to get a hand on all the figures that need to be build. I use string substitution to get the prepared PDF files in the output directory.

document_tex := $(wildcard physics*.tex)
document_pdf := $(document_tex:%.tex=%.pdf)

figures_tex := $(wildcard Figures/*.tex)
figures_pdf := $(figures_tex:Figures/%.tex=build/%.pdf)

Then the main document need to depend on the figures:

$(document_pdf): $(figures_pdf)

Last one needs the rules to build the parts as described above:

build/page/%.tex: Figures/%.tex
        ../build-system/tikzpicture_wrap.py $< $@

%.pdf: %.tex
        cd $$(dirname $@) && lualatex --halt-on-error $$(basename $<)

build/%.pdf: build/page/%.pdf
        pdfcrop $< $@

Then a single make -j $(nproc) will compile all the images in parallel and then build the main document. As we often have a lot of plots and figures in our lab reports, this is a great time-saver on a multi-core machine. Changing a single figure will only lead to compilation of that figure and the main document.

Since the output from lualatex is so long, it makes sense to use a | tail -n 30 there to only get the trailing warning and error messages. This keeps the console output a bit cleaner and multiple instances of lualatex will not mangle their output together. See below for latexrun which solves that problem in a nice way, albeit with other drawbacks.

This whole thing is implemented in our latest round of the lab course:

GitHub Page

git clone git://github.com/martin-ueding/physics601-reports.git

See the common.mak there, it contains all the logic and some high-level comments.

Plots

That works well for figures. With plots, one usually reads some additional files. The Makefile also needs to take those dependencies into account. Our analysis is done using a Python program with SciPy. We read data from the Data folder and process it. Values that go into the template go into template.js to be inserted into the document (see Computation Results). Therefore we have the following dependency:

$(build)/template.js: crunch.py $(wildcard Data/*.*) $(wildcard *.py) | $(build)/xy $(build)/to_crop
        ./$<

The last two dependencies make sure that the build folders are all created. The crunch.py will create a few files in $(build)/xy which contain the data points for the plots. Each plot then depends on all those points:

$(plots_page_pdf): $(build)/template.js $(wildcard $(build)/xy/*.?sv)

Although that sounds like all the plots are rebuilt when something changes, it does not. make will indeed want to recompile those figures, but latexmk is smart enough to figure out that the contents of the needed files has not changed. Therefore this does not take any time and gives a less complicated dependency in the Makefile.

We have also split the TikZ graphics into directories Figures and Plots. The former do not depend on the output of the computations, the latter do so. This way one can run the computation while compiling the figures. Once the (lengthy) computation is done, the compilation of the plots will start. After all that, the compilation of the main document starts.

This scheme also allows us to use pdflatex for the main document and lualatex for the figures. This was handy as we needed to run this on Ubuntu 14.04 which has a rather old version of lualatex shipped with it. That way one was able to compile at least the document and leave out the figures.

latexrun

latexmk is great but latexrun is superior for my use cases. The output of pdflatex or lualatex is extremely verbose and, frankly, completely useless. The interactive error handling drives me crazy (I always use --halt-on-error with lualatex) and all the warnings drown in useless information. If you compare that to the output of a modern C++ compiler (g++ 4.9 or greater, clang 3.6 or greater) it is just nonsense. Luckily there is latexrun which parses the pages of output and displays the warnings just like a C++ compiler would: filename, line number, and the warning.

There is no big adjustment for the build system, just replace the rules with latexmk with the following:

%.pdf: %.tex | $(build)
        cd $$(dirname $@) \
            && latexrun -O $$(basename $< .tex).latexrun --latex-cmd lualatex --bibtex-cmd biber $$(basename $< .tex)

It is utterly important to add the -O $$(basename $< .tex).latexrun option. Otherwise all the intermediate files would go into latexrun.out. In some version latexrun started to add a locking mechanism for this directory in order to prevent race conditions. This is nice but will effectively serialize all the lualatex calls again even if you run make -j. To avoid the lock contention completely one should have an output directory for every document to be compiled.

Just like latexmk, latexrun will run lualatex the appropriate number of times to get everything sorted out.

Caution

There seems to be some issue with biber not being invoked correctly every time. A simple cd FOO.latexrun && biber FOO and then calling latexrun again will fix this.

Also makeindex is not called automatically. One can fix it the same way by calling it manually.

This in total makes it run more often than needed since the auxiliary files change. It is faster to use latexmk than latexrun, the output is just not as clean.