Sphinx to Nikola

As you can likely tell, my website looks different now. For a very long time I have been using Sphinx with a custom theme for my personal website. It has served me rather well, but over the time I have been pushing it into ways that are not really how it is designed. Foremost it is a documentation generator. As such it has a hierarchical structure and does not support blog posts or RSS feeds. Some of my content is “timeless” like the study material, but other things slowly outdate and would rather fit into a blog structure. There are extensions to Sphinx that try to add these features, but I rather decided to move it to Nikola instead.

In this post I will describe how I have made the transition and what was needed to get the content moved over.

There is a nice overview of static website generators that lists many static site generators. There are many to choose, but I want to stick with one that is fairly popular and won't have it's support dropped soon. Also it should be mature, I want to keep going with it for a while into the future. In the past I already looked at Pelican and Tinkerer but found them rather cumbersome. And I have looked at Nikola in the past already. What put me off was that it only had two levels of hierarchy for the pages. This seemed to be very limiting at the time, but now I see it much more relaxed and just moved.

I am looking forward to now have it easier to write blog posts and have a better suited tool for that task such that I can concentrate solely on writing the content.

In this post I will show you how that transition was made. It took me three days to get everything converted. I had 194 posts and 34 pages that needed to be converted.

Structure and navigation

Sphinx has an tree of documents that can be arbitrarily deep. One defined an index page and uses the toctree directive to add child pages to it. I have done so before, I had a hidden toctree which included all the other structural pages like “Programming”, “Computer” or “Studies”. There I would then have additional toctree directives that include all the other files. This way Sphinx does not really have a site navigation but rather a document structure. For a manual to be rendered to HTML, PDF or EPUB, this is just great; for a personal website it is a bit clunky already. The top navigation was a real hack in the template. I accessed the complete structure tree and let it render the top level. I had to adapt the CSS classes manually such that the current one would render at active.

Nikola approaches this by having posts and pages. The posts are the blog posts, they have one category and multiple tags. They are ordered chronologically. Using the navigation I can manually set what I want. By just pointing to the URLs of the category indices I have lists of posts for that category, without any hacks.

Sphinx did not care about the location of the source files, I just had to manually link them together for the overall tree structure. With Nikola I have a directory pages and posts. The posts get all thrown into the blog structure, and the pages are compiled but I have to manually link to them. The blog posts themselves have meta data which contains their category and tags. I use the YAML metadata and it looks like this:

---
title: Derivation of the Euler-Lagrange-Equation
date: 2013-06-12 11:27+0200
category: Science
tags: Physics
---

Sphinx uses the image, figure and download directives to tether files into the document. The images can reside anywhere, but I have just had them in the same directory as the text. My travel report about China was located at travel/2019-06-china/index.rst and the images are all in the same directory. This makes it easy for me to see all the files that belong to one post.

Nikola rather wants to have these in a separate top-level directory images. This can easily be changed by adding the posts directory to the image folders in the conf.py:

IMAGE_FOLDERS = {'images': 'images', 'posts': 'posts'}

And as the posts can reside anywhere, I can just have the article file at posts/wuhan-beijing-china-2019/main.md and put all the images into that directory. This way I have all the pictures together with the article and do not need to worry about directory paths when including the images in the source with Markdown's ![]().

The same can be done with the files as well. One just has to disable the separate copying of sources, otherwise there are two rules copying the original files and gives a clash.

COPY_SOURCES = False
FILES_FOLDERS = {'files': '', 'pages': 'pages'}

This allows me to keep the PDF documents from my studies also close to their documents. With both of these in place I can keep the directory structure that I have from Sphinx and the URLs just change slightly. Also the posts can still be called index.md, such that the file name stays the same.

Theme

I did not quite like the default bootblog4 theme. There is the bootstrap4 theme which was more suitable for a mixed blog and site like I have. But the color and fonts are the same as with every other bootstrap webseite, so I wanted to just have it a bit differently.

I was very happy to learn that there is the concept of bootswatch and that it allows me to just exchange the subtheme without having to change anything myself. So I was just able to install a different subtheme via this command:

nikola subtheme -s flatly

And in the configuration file I needed to switch the theme:

THEME = "custom"

Convert reStructuredText to Markdown

Nikola supports different markup languages. I have been using reStructuredText for my old website and I could have just continued using that. But I have been using Markdown for everything else (technical reports, R notebooks, code documentation, personal diary) and therefore wanted to make the jump at some point. Nikola uses Pandoc for the Markdown conversion, so a lot of non-original Markdown features are supported as well. The ones that are still missing are these:

  • Figure with caption
  • Citations

Even custom directives are supported with Markdown, which is quite nice.

To convert the reStructuredText files to Markdown I just used Pandoc:

pandoc --atx-headers  --columns=79 index.rst -o index.md

There are a few things that needed to be done, for instance the classes at the fenced code blocks or unescaping a few ' and " symbols.

The R-Markdown articles come out as Markdown. Before I needed to convert them to reStructuredText, now I can just leave them there. That makes it even more convenient as I can have a single R-Markdown file which contains the whole post.

For the captions I tried to enable the implicit_figures option, but that somehow did not work properly. I also tried to use the figureAltCaption extension but that did not do the trick either. So I just used regular expressions to transform the Markdown code into the HTML code that would come out anyway. In Vim I used the following hardly readable snippet to convert.

:%s#\v\!\[([^]]+)\]\(([^)]+)\)#<figure>
  <img src="\2" />
  <figcaption>\1</figcaption>
</figure>#

References and URLs

All my internal references have been using Sphinx doc directive, which is not supported by Nikola. So I needed to go through all of them and update the links.

Also the URLs of all site have changed. As I know where they have been and where I moved them, I did not want to embarrass myself with broken bookmarks and external links. In my .htaccess there are already a bunch of redirections, now I just add another set of them.