R Horror, aka “HorroR”

I have been using R for a while now, there are a few things that seem outright horrifying to me. Don’t get me wrong, R is a great tool. I just find the language a bit bewildering.

Dot and dollar

In almost every other language, the dot is a namespace separator. Say you have in C++:

Myclass instance(3);

Not so in R! There, the . is just what you use instead of the underscore. There are functions like install.packages and update.packages and read.table which in other languages would be called install_packages and so on.

But the S3 object system hooks into these dots. So when you have an object of type myclass and call plot(instance), that function will forward to plot.myclass(instance). So the dot also has some sort of hierarchy.

And what is the dot in other languages is the dollar in R.

Really confusing at first, after a while one does think about it too much any more.

Abbreviating named function parameters

When you have a function, you can call it with named parameters. Take the following function:

f <- function (parameter = NA, argument = NA) {
    cat('parameter:', parameter, '\n')
    cat('argument:', argument, '\n')

It has two named arguments, both are set to NA by default. Let us call it with just the second one set to some value. We do not need to type out the whole name, we can just type out arg:

f(arg = 1)

The output that we get is the following:

parameter: NA
argument: 1

As you can see, the 1 has been passed to the parameter argument although we just typed arg in the function call.

Sounds great? Let me change your mind. Say the author of the function adds a second argument which is a substring of the other, like so:

f <- function (parameter = NA, argument = NA, arg = NA) {
    cat('parameter:', parameter, '\n')
    cat('argument:', argument, '\n')
    cat('arg:', arg, '\n')

The call to the function has not changed: f(arg = 1). But the output has:

parameter: NA
argument: NA
arg: 1

There is no warning from the runtime that you have abbreviated a parameter name. Also the runtime has no way of knowing that you wanted to have the other argument.

There is some little solace. Namely when you have two parameters that share a common prefix but one is not a substring of the other. An example would be this:

f <- function (parameter = NA, argument = NA, argparse = NA) {
    cat('parameter:', parameter, '\n')
    cat('argument:', argument, '\n')
    cat('argparse:', argparse, '\n')

When you run f(arg = 1) as before, you finally get an error:

Error in h(arg = 1) : argument 1 matches multiple formal arguments
Execution halted

In my code there could be such time bombs and there seems to be no way of getting any error for doing this.

Assignment operator

R has five assignment operators:

  • <-
  • <<-
  • ->
  • ->>
  • =

The first one is a normal assignment, x <- 3. The second one works the same, it just looks for a variable in other scopes before creating a local shadowing variable. The third and fourth one are just mirrored versions, such that you can write 3 -> x. At some point in the past, this must have seemed like a really great idea.

Since this is confusing for people coming from other programming languages where = is the assignment operator (C, C++, Python, Haskell, PHP, Java, JavaScript), modern versions of R also have this = assignment operator. Now there are ideological wars on whether <- or = is the correct one to use.

The thing is that <- is a token consisting of two characters, each of them are a valid token by themselves! So we can write both x <- 5 and also x < -5. The first assigns 5 to x (\(x := 5\)), the second does the comparison \(x < -5\). This in itself is not such a big deal, both cases are easy to read for a human.

But what happens if you have code by a person who does not put spaces around operators? Their code will feature x<-5. What does it do? It will be parsed as x <- 5. But can you be certain that the author did not mean x < -5? In C++ you can just write x<-5 and it will mean x < -5 because there is no <- operator.

In practice this might not be an actual problem. The values of the expressions are different. But perhaps the code with if (x<-5) does the apparently right thing today but not tomorrow.

Puns everywhere

R is riddled with puns. A lot package names are something and an “R” put into it. There is “knitr”, “tidyr”, “stringr”. The “knitr” package can generate beautiful reports from R document. These reports are documents with headings and text interleaved with R code. This is my preferred format for things like experimental reports where you want to document something and show off your data.

The methods in this package are called knit and purl. I knew the first word, the activity of converting wool into garments. But “purl” I had to look up, it seems to be be similar but into the opposite direction. One could argue that rmd_to_md and rmd_to_r would be better descriptions, but it would also be bourgeois and not funny. The problem with those names is that you cannot search for them, nor can you remember or guess them easily.

But I won’t complain about the “tidyverse” packages because they have fundamentally changed the way I work with data and “ggplot2” is hands down the best plotting system I have ever tried.

Missing arguments

In languages like Python and C++ there is a concept of optional arguments. In R, this also exists, but with a slight twist.

Define function like this in Python:

def py_func(a, b):

The argument b is not used within the function, but bear with me.

If you try to call this as py_func(1), it will fail loud and clear:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: py_func() missing 1 required positional argument: 'b'

What you could do is to define the function with a default value for b, like this:

def py_func(a, b=2):

Now calling it as py_func(1) works just fine, because this gets called as py_func(1, 2).

In C++ it is exactly the same way, you need to specify all the arguments that have no default value.

In R, this is not the case. When you declare a function analogously to the first example, it would look like this:

r_func <- function(a, b) {

Now calling r_func(1) will work! It will print out the 1 and does not complain. Only when we call r_func() without any arguments, it will complain that the parameter a was not passed a value and that there is no default value.

This means that when you write a function, you cannot be certain that the caller passed something for each parameter. Only when you do some computation with one of the parameters, it will crash. Sometimes you even want to make a parameter optional without it having a default value. In languages like Python or C++ you would have to use some neutral sentinel value like Null (Python) or nullptr (C++). However, there still is a difference between a parameter not being passed at all and it explicitly having passed the value NULL (R).

In R there is the function missing which will check whether the parameter was missing in the call. Therefore you see in R code things like this:

r_func <- function(a, b) {
    if (missing(a)) {
        stop("Parameter `a` is missing and needs to be passed!")
    if (missing(b)) {
        stop("Parameter `b` is missing and needs to be passed!")

There is no way to know from the function signature which parameters are mandatory and which are optional. You need to look into the documentation or even the function definition to figure that out.