Rant: Notebook Interfaces

Date:2016-12-17
Abstract:Cell-based notebook interfaces like Jupyter or Wolfram Mathematica store variables in the context of an (invisible) kernel. It is easy to create unresolvable dependencies by defining variables and deleting the code. An example is shown.

There are a couple of programs that have cell-based notebook interfaces. In this example I will use iPython Notebook (now called Jupyter notebooks). The exact same results can be obtained with Wolfram Mathematica.

In a normal programming session, you would type your program in your editor/IDE and then run it from top to bottom. If the program runs through, it will likely run from top to bottom again just fine. Of course you can shoot yourself in the foot here by reading/writing files to disk. In a cell-based notebook, this is just so much easier.

Example

The following will show screenshots of a Python notebook, each screenshot is the full notebook. Let me start with a single cell and assign the value of the (undefined) var1 to var2:

../../_images/notebook1.png

This does not work because var1 is undefined. Let me define var1 after var2 now:

../../_images/notebook2.png

In in a normal programming context, this would just not work at all. In this cell-based thing, I can now evaluate the first cell again. The kernel will have var1 defined and that runs just well:

../../_images/notebook3.png

The only trace of this strange dependency is in the evaluation numbers in front of the cell. Since those variables are now globally defined, I can just evaluate the notebook from top to bottom again:

../../_images/notebook4.png

Even worse, I can now remove the line that defines var1. The remaining line will evaluate just fine:

../../_images/notebook5.png

At this point, one could consider sending this notebook to somebody else. Or perhaps shutting down the computer (and with that the kernel) and assume that everything is well. However, the value of var1 is nowhere in the program. It is just in the current instance of the kernel because in the past I have evaluated a cell that defined var1. So let me restart the kernel now:

../../_images/notebook6.png

Evaluating the one cell in the notebook again gives the same error as before:

../../_images/notebook7.png

Schlußfolgerungen

This simple example shows how easy it is shoot oneself into the foot with a cell-based notebook. Perhaps I am just more used to normal procedural programs that I find this behavior rather upsetting. I have seen people fall into this trap while working on their notebook. At some point, it would not run cleanly from top to bottom after the kernel has been reset. They had to re-implement a couple of things because the code that has set the needed variables was gone.

A great advantage is that one can run parts of the programs after a minor change. I do like this and find it a big waste of time to run my whole analysis program after each trivial change. Yet I fear this fallacy of cyclic or broken dependencies.

If you prefer the notebook interfaces, how do you deal with that? Please send me an email, I’d like to hear about it!