Exploring the Brainfuck programming language

In this first article in a planned series about the Brainfuck programming language we will get to know the language and try to solve simple problems in it.

To be clear from the beginning: this is a nonsense project. The world doesn't need Brainfuck as a programming language, neither does it need me writing software for it. However, it itches me to write a rudimentary compiler and also a parser for a programming language. Brainfuck is simple enough to serve as an “assembly language” and compilation target. I plan to use a subset of C or Python as my higher level language that I then compile into Brainfuck. This also requires me to write an interpreter for Brainfuck such that I can evaluate the generated programs and verify that they produce the same result as the programs that I have compiled.

The choice of Brainfuck is also influenced by its relation to the Turing machine. One often reads that Turing machines can compute anything computable, so basically every program in every language must be reducible to a Turing machine. These Turing machines need to have a program specified, though. And to encode this state was abstract, until I realized that Brainfuck is pretty related to the Touring machine as stated in the Wikipedia article about Brainfuck.

From that same article we can take how the language works. It has a program and a memory tape with numbers stored in each section of the tape. We can think of an integer array. There is a program pointer and a data pointer. This corresponds to an internal state of the Touring machine and its position on the Turing tape.

The commands in the programming language are the following:

  • + increments the value at the current position of the tape, - decrements it.
  • > moves the tape forward by one position, < moves it backward.
  • When the program encounters a[, it will check the tape. If that is zero, jump to the matching ]. Otherwise go to the next instruction. When it encounters the ], it will check the tape again. It if nonzero, then jump to the matching [.
  • . prints the current tape value to the screen, , takes a keyboard input and puts it at the current tape position.

The Turing machine doesn't have input and output, it takes its inputs on the tape and leaves the output on the tape. We therefore don't need the input and output if we can manipulate the tape. Or the other way around, we don't need to look at the tape if we have this input and output.

Since existing Brainfuck interpreters seem to use input and output, so will we here. This will allow us to check the programs here against an existing interpreter without having to write one ourselves first.

Copying an input

One rather simple problem would be take a number from the input and just emit it again. We just read the input with , and emit it again with .:

,.

This didn't move the tape at all, also it did not have any conditionals in it.

Emitting many characters

The next more complicated program that I can think of reads a number and then emits as many characters as this number was.

,[.-]

The output here is 5, 4, 3, 2, 1.

Moving a number on the tape

I might want to use the tape as a stack when I implement functions later on in the higher level language. It is pretty typical to allocate space for the return value first and then have all the arguments and then the local variables within that function.

We first go one step to the right and increment that cell by three to set it up as input. Then we copy it using a loop. As long as the cell is not zero, decrement it, go left, increment that, go back right. If the number is still not zero, repeat. In the end go left to the result cell and emit as many characters are there are in that cell.

>
+++

[-<+>]

<
[.-]

Adding two numbers

We can also make it add two numbers in a similar fashion. I will take 2 + 3 as an example. After the initialization we have a tape with contents [0, 2, 3]. Then the second paragraph runs, which first adds the 3 onto the 2, yielding a tape of [0, 5, 0]. And then it will move to the left and do the same, resulting in [5, 0, 0]. The program then emits 5 characters when run.

>
++
>
+++

[-<+>]
<
[-<+>]

<
[.-]

As the tape is “right infinite”, we can always truncate the trailing zeros and just write it as [0, 5, 0] = [0, 5] and [5, 0, 0] = [5].

Copying a number on tape

It sounds pretty simple, we want to copy a number from one tape location T1 to tape location T0. But as we have seen in the above two examples, we have effectively moved the number, not copied it.

The only way that I can think of copying is to make two copies at a time and destroy the original while we're at it. So here we start with a tape of []. The tape is right-infinite, so I will omit zeros on the right. The tape expands as it needs to.

We start with [] and write a 2 into T1 (zero based index) and get [0, 2]. Then I copy the entry of T1 into T0 and T2 by subtracting from T1 and adding to both T0 and T2. We end up with [2, 0, 2]. And then I move T2 into T1 with the code we know, ending up with [2, 2]. To make sure that we got it right, I emit the states in T0 and T1.

> ++        Set T1 to 2
[-<+>>+<]   Copy T1 to T0 and T2
> [-<+>]    Move T2 to T1
<< [-.]     Emit T0 as characters
> [-.]      Emit T1 as characters

This seems to work well. Now we have another building block to work with.

Multiplying two numbers

Addition was okay, but what about multiplication? We need to use two loops here, one for addition and for for multiplication. If I want to calculate 2 Ă— 3 I can write this as 3 + 3. And this is how we are going to do it here as well.

We need the building blocks from before to pull this off. I need to duplicate the 3 such that I can accumulate it onto the result.

Let us go through the algorithm:

  • First I initialize two fields on the tape. After the first paragraph we have a tape with [0, 2, 3].
  • Then I loop as long as T1 is nonzero. This is how the first iteration looks like:
    • We go to T2 and then copy T2 to T3 and T4, having a tape of [0, 2, 0, 3, 3].
    • We go to T4 and move T4 to T2 again and then have [0, 2, 3, 3] in the tape.
    • We go back to T3 and accumulate T3 onto T0 such that we end up with [3, 2, 3].
    • We go back to T1 and decrement it. The tape will contain [3, 1, 3].
  • This loop is repeated again until we only have [6] there. The loop will stop.
  • We emit six characters at the end.
> ++            Write 2 in T1
> +++           Write 3 in T2

<               Goto T1
[               While T1 is nonzero
  >             Goto T2
  [->+>+<<]     Copy T2 to T3 and T4
  >>            Goto to T4
  [-<<+>>]      Move T4 to T2
  <             Goto to T3
  [-<<<+>>>]    Accumulate T3 onto T0
  <<            Goto to T1
  -             Decrement T1
]

< [.-]         Emit T0

It became pretty apparent that a simple operation like 2 Ă— 3 needs a lot of code to pull off. And we need a bit of scratch space on the tape. Managing the extra space with the four tape sections became a bit cumbersome and could quickly get out of control without careful tracking of their uses.

Conclusion

We have looked at the Brainfuck language and noted its relation to the Turing machine. We have also looked at a few programs of increasing complexity.

While working out the multiplication program we noted that there is a need for scratch space and that tracking the “variables” becomes a bit cumbersome. This should motivate why there are higher level languages.

In the next installment of this series we will create an interpreter for these programs such that we don't have to rely on online interpreters. And then in the third post we will develop a compiler from a higher level language into Brainfuck.