# Lacking Uncertainty Estimations in Natural Language Processing Papers

My university background is physics, which is an empirical science. All measurements or derived quantities must be quoted with an error estimate. This should ideally include both a statistical and a systematic error. If you take a look at the paper from my thesis, you will find this table:

It contains the comparison of a certain quantity (pion scattering length) from various other publications and our result. There are uncertainties given, either as a combined value or as separate statistical and systematic ones. One result by the ETMC even has asymmetric systematic errors quotes. You can look at these numbers and figure out whether the result from that work lies within the error budget of the other works. The raw table isn't as meaningful as a plot, but you could create a plot from the table. And one can see that our result has a one standard deviation confidence interval of $[-0.0567, -0.0395]$, so it easily encompasses all previous work. Also it has the largest error of all results, so it is the least precise addition to the field. This is okay as it was just an auxiliary result and we weren't aiming for precision there.

In my new field, natural language processing (NLP), I cannot say the same thing. There are no error estimates whatsoever! And it really annoys me, you cannot really derive any conclusion from the data. Take for instance the paper on BERT. They show a table where they compare two variants of the *BERT* model with other language models in various tasks: