Build system for TikZ and pgfplots
LaTeX compilation is quite slow. Especially if some reference changes and the document has to be compiled three times in order to get all the page numbers sorted out. This is bearable as the output of LaTeX is way better than any other text processing system that I have tried.
Lately I have made a lot of figures with TikZ and my plots with pgfplots
.
Their output looks just like the document itself: you have the same text, a
matching line width and therefore a similar gray value. The figures just blend
in perfectly.
This comes at a price, compilation of complicated pgfplots
figures takes well
over 10 seconds each. If you have three runs, you have to wait half a minute
for each plot in your document. For a moderately complicated document like a
lab report, we easily have over ten plots. This means a lot of waiting time. In
most cases one only needs one compilation run, but that still wastes a lot of
time. Since the memory in the pdflatex
engine is limited, it cannot build
arbitrary complex plots, either.
The time is wasted as the plot itself did not change at all, it is just not
cached when you have a plain \begin{tikzpicture}
directive in your document.
TikZ comes with the feature to externalize. It works by extracting all the
tikzpicture
environments into separate files. Also a Makefile
is generated
for you in that directory. You run make
or even make -j $(nproc)
and can
compile the figures in parallel. On the next run of pdflatex
, the compiled
PDF images are included in the main document.
This is good, it does save time, mostly. On the very first run, you will have
no figures. They are all replaced by the text
[[ discarded due to externalize ]]
(or similar). This means that the height
used in the document is less than that of the actual figure. LaTeX will
position the figure
floats then differently, screwing up all the references.
The tool latexmk
will then run three times to get the references sorted out.
Then you run make
for the figures and latexmk
again. In total that were six
runs of pdflatex
.
If you change a figure, the first run will export the new LaTeX code and
discard the figure. You have to run make
and then run pdflatex
again. This
back-and-forth with the figures screws up the change monitoring of latexmk
,
one ends up compiling four to six times for every little change in the figures.
To me, this is a waste of my time. It actually wastes so much time that coming up with a better system amortizes this time.
Idea
While I was in Jülich for the JSC GSP, we made our reports and presentations
with a lot of snippets in directories. So there was a Figures
directory,
another Plots
one and Listings
. In say Plots/scaling.tex
one then had the
following:
\begin{tikzpicture} … \end{tikzpicture}
In the main document, one would do this:
\begin{figure} \centering \input{Figures/scaling} \caption{…} \label{…} \end{figure}
The TikZ externalize feature would then externalize the code snippet again to
build it with a generated Makefile
.
So why use \input
and externalize at the same time? Why not compile the
figures directly into PDFs and include those PDFs without any magic into the
main document? The dependencies can also modeled in the main Makefile
of the
project.
Implementation
First I needed a little helper to wrap such a snippet into a full document.
This is a Python program called
tikzpicture_wrap.py
.
It uses the Jinja2 template engine to just wrap the snippet with
\documentclass{scrartcl}
and the like. I use the same documentclass
as in
the main document in order to get exact same fonts and identical \linewidth
.
The standalone
document class automatically crops the resulting image.
However, it does not allow me to generate figures for beamer
and scrartcl
in an easy way. The wrapped LaTeX file is then put into build/pages
.
The wrapped document is then compiled with pdflatex
or lualatex
and stored
into build/pages
. I use pdfcrop
to crop the resulting PDF tightly. One has
to use \pagestyle{empty}
to remove the page number. The cropped image is just
put into build
.
After all images are compiled, I can compile the main document. There I use
\includegraphics
to insert the pre-compiled PDF into the document.
The main Makefile
consists of the following blocks. First we need to get a
hand on all the figures that need to be build. I use string substitution to get
the prepared PDF files in the output directory.
document_tex := $(wildcard physics*.tex) document_pdf := $(document_tex:%.tex=%.pdf) figures_tex := $(wildcard Figures/*.tex) figures_pdf := $(figures_tex:Figures/%.tex=build/%.pdf)
Then the main document need to depend on the figures:
$(document_pdf): $(figures_pdf)
Last one needs the rules to build the parts as described above:
build/page/%.tex: Figures/%.tex ../build-system/tikzpicture_wrap.py $< $@ %.pdf: %.tex cd $$(dirname $@) && lualatex --halt-on-error $$(basename $<) build/%.pdf: build/page/%.pdf pdfcrop $< $@
Then a single make -j $(nproc)
will compile all the images in parallel and
then build the main document. As we often have a lot of plots and figures in
our lab reports, this is a great time-saver on a multi-core machine. Changing a
single figure will only lead to compilation of that figure and the main
document.
Since the output from lualatex
is so long, it makes sense to use a
| tail -n 30
there to only get the trailing warning and error messages. This
keeps the console output a bit cleaner and multiple instances of lualatex
will not mangle their output together. See below for latexrun
which solves
that problem in a nice way, albeit with other drawbacks.
This whole thing is implemented in our latest round of the lab course: https://github.com/martin-ueding/physics601-reports
See the common.mak
there, it contains all the logic and some high-level
comments.
Plots
That works well for figures. With plots, one usually reads some additional
files. The Makefile
also needs to take those dependencies into account. Our
analysis is done using a Python program with SciPy. We read data from the
Data
folder and process it. Values that go into the template go into
template.js
to be inserted into the
document. Therefore we have the
following dependency:
$(build)/template.js: crunch.py $(wildcard Data/*.*) $(wildcard *.py) | $(build)/xy $(build)/to_crop ./$<
The last two dependencies make sure that the build folders are all created. The
crunch.py
will create a few files in $(build)/xy
which contain the data
points for the plots. Each plot then depends on all those points:
$(plots_page_pdf): $(build)/template.js $(wildcard $(build)/xy/*.?sv)
Although that sounds like all the plots are rebuilt when something changes,
it does not. make
will indeed want to recompile those figures, but latexmk
is smart enough to figure out that the contents of the needed files has not
changed. Therefore this does not take any time and gives a less complicated
dependency in the Makefile
.
We have also split the TikZ graphics into directories Figures
and Plots
.
The former do not depend on the output of the computations, the latter do so.
This way one can run the computation while compiling the figures. Once the
(lengthy) computation is done, the compilation of the plots will start. After
all that, the compilation of the main document starts.
This scheme also allows us to use pdflatex
for the main document and
lualatex
for the figures. This was handy as we needed to run this on Ubuntu
14.04 which has a rather old version of lualatex
shipped with it. That way
one was able to compile at least the document and leave out the figures.
latexrun
latexmk
is great but latexrun
is superior for my use cases. The output of
pdflatex
or lualatex
is extremely verbose and, frankly, completely useless.
The interactive error handling drives me crazy (I always use --halt-on-error
with lualatex
) and all the warnings drown in useless information. If you
compare that to the output of a modern C++ compiler (g++
4.9 or greater,
clang
3.6 or greater) it is just nonsense. Luckily there is
latexrun which parses the pages of
output and displays the warnings just like a C++ compiler would: filename, line
number, and the warning.
There is no big adjustment for the build system, just replace the rules with
latexmk
with the following:
%.pdf: %.tex | $(build) cd $$(dirname $@) \ && latexrun -O $$(basename $< .tex).latexrun --latex-cmd lualatex --bibtex-cmd biber $$(basename $< .tex)
It is utterly important to add the -O $$(basename $< .tex).latexrun
option.
Otherwise all the intermediate files would go into latexrun.out
. In some
version latexrun
started to add a locking mechanism for this directory in
order to prevent race conditions. This is nice but will effectively serialize
all the lualatex
calls again even if you run make -j
. To avoid the lock
contention completely one should have an output directory for every document to
be compiled.
Just like latexmk
, latexrun
will run lualatex
the appropriate number of
times to get everything sorted out.
There seems to be some issue with biber
not being invoked correctly every
time. A simple cd FOO.latexrun && biber FOO
and then calling latexrun
again
will fix this.
Also makeindex
is not called automatically. One can fix it the same way by
calling it manually.
This in total makes it run more often than needed since the auxiliary files
change. It is faster to use latexmk
than latexrun
, the output is just not
as clean.