JSON vs. YAML vs. TOML

There are a bunch of data formats which can store structured data. The most popular seem to be JSON and YAML. A relatively new one is TOML, which is gaining traction in the Python ecosystem. For most of my projects I just use YAML files, but I have tried out the TOML language as well.

Here I will show the same data in the three formats and write a bit about the positive and negative sides of each. The data is the configuration for a smartphone synchronization script that copies images and downloaded files from the device to my laptop, and also copies planned routes and my password file to the device. Because I want to change the files and paths without having to change the script, I made it a configuration file.

TOML

With TOML, there are headers like in the INI file format. It allows to specify nested structures by using periods in the headers. Lists are done with double square brackets. And then within each section one can use key-value-pairs, or use inline structures. The configuration file looks like this:

[device.martin-mia1]
path = "/sdcard/"
host = "martin-mia1"
user = "mu"

[[tasks.CopyToHost]]
source = "ClearScanner/share"

[[tasks.CopyToHost]]
source = "DCIM/Camera"
destination = "~/TODO/"

[[tasks.CopyToHost]]
source = "bluetooth"

[[tasks.CopyToHost]]
source = "Download"

[[tasks.CopyToHost]]
source = "Pictures"

[[tasks.CopyToHost]]
source = "TODO"

[[tasks.CopyToHost]]
source = "Android/data/net.osmand/files/tracks/rec"
destination = "~/Dokumente/Karten/Tracks/"

[[tasks.CopyToDevice]]
source = "~/Dokumente/Karten/Routen"
destination = "Android/data/net.osmand/files/tracks/Routen"
delete = true

[[tasks.CopyToDevice]]
source = "~/Dokumente/Hauptliste/Hauptliste.kdbx"
destination = ""

The “O” in TOML stands for obvious, but to me it is not completely obvious on how this maps to the actual data structure. It takes a bit, and I can figure it out eventually. But I don't really like it that much. Especially the list syntax looks really cumbersome.

Another thing that gives it flexibility is that the elements of one list don't have to be below each other. One could have the sections in any order, because the section header always gives the path within the data structure.

If one has a relatively limited structure, then TOML can look great. One could structure the configuration in a way that it doesn't use many lists, but rather give names to the list items and have it as dict items instead.

JSON

The JSON format is quite classic, and I can read it effortlessly. Sometimes I dump the TOML files as JSON in order to figure out what they actually mean. The format is clear to read, but it is very verbose and hard to write. I only use that for stuff where the computer reads and writes it.

{
  "device": {
    "martin-mia1": {
      "host": "martin-mia1",
      "path": "/sdcard/",
      "user": "mu"
    }
  },
  "tasks": {
    "CopyToDevice": [
      {
        "source": "~/Dokumente/Karten/Routen",
        "destination": "Android/data/net.osmand/files/tracks/Routen",
        "delete": true
      },
      {
        "source": "~/Dokumente/Hauptliste/Hauptliste.kdbx",
        "destination": ""
      }
    ],
    "CopyToHost": [
      {
        "source": "ClearScanner/share"
      },
      {
        "source": "DCIM/Camera",
        "destination": "~/TODO/"
      },
      {
        "source": "bluetooth"
      },
      {
        "source": "Download"
      },
      {
        "source": "Pictures"
      },
      {
        "source": "TODO"
      },
      {
        "source": "Android/data/net.osmand/files/tracks/rec",
        "destination": "~/Dokumente/Karten/Tracks/"
      }
    ]
  }
}

Another downside of JSON is that it doesn't support comments. TOML and YAML do that. However, it is hard to re-write such a file with comments and change values while keeping most of the comments intact. But this just means that commented files are only read by programs, and if programs write the files, one should not comment them.

YAML

The YAML format is my definite favorite. I find it easy to read and write. One negative about YAML is its potential complexity, as one can have references to other part of the file, or even serialize custom types. This is something that makes loading YAML files a non-trivial task. The pyyaml library even has a safe_load() function which tries to detect references cycles and other mischief which could lurk in the file.

device:
  martin-mia1:
    host: martin-mia1
    path: /sdcard/
    user: mu
tasks:
  CopyToDevice:
  - source: ~/Dokumente/Karten/Routen
    destination: Android/data/net.osmand/files/tracks/Routen
    delete: true
  - source: ~/Dokumente/Hauptliste/Hauptliste.kdbx
    destination: ''
  CopyToHost:
  - source: ClearScanner/share
  - source: DCIM/Camera
    destination: ~/TODO/
  - source: bluetooth
  - source: Download
  - source: Pictures
  - source: TODO
  - source: Android/data/net.osmand/files/tracks/rec
    destination: ~/Dokumente/Karten/Tracks/

As one can see with this configuration file example, it is the shortest and most easy to read format. I think that I will stick with YAML for current and future projects for a while to come.

I intentially did not mention XML, because that is a language for documents, and not for hierarchical data. One could represent all this with XML, but in general XML is special with this mix from text and tags.