Default Standard Deviation Estimators in Python NumPy and R

Martin Ueding

2020-06-10

Code & Zahlen

I recently noticed by accident that the default standard deviation implementations in R and NumPy (Python) do not give the same results. In R we have this:

> x <- 1:10
> x
 [1]  1  2  3  4  5  6  7  8  9 10
> sd(x)
[1] 3.02765

And in Python the following:

>>> import numpy as np
>>> x = np.arange(1, 11)
>>> x
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
>>> np.std(x)
2.8722813232690143

So why does one get 3.02 and the other 2.87? The difference is that R uses the unbiased estimator whereas NumPy by default uses the biased estimator. See this Wikipedia article for the details.

The biased one used in NumPy is the following: $$ \hat\sigma = \mathop{}\mathrm{sd}(X) = \sqrt{\frac{1}{n} \sum_{i = 1}^n (x_i - \bar X)^2} \,. $$

Where in R it uses the unbiased one: $$ \hat\sigma = \mathop{}\mathrm{sd}(X) = \sqrt{\frac{1}{n-1} \sum_{i = 1}^n (x_i - \bar X)^2} \,, $$

This is also documented in R:

Like ‘var’ this uses denominator n - 1.

The documentation for np.std shows that there is an additional argument:

ddof : int, optional

Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero.

So NumPy actually has this general form implemented: $$ \hat\sigma = \mathop{}\mathrm{sd}(X) = \sqrt{\frac{1}{n-\mathrm{ddof}} \sum_{i = 1}^n (x_i - \bar X)^2} \,, $$

So why do they do that in this way and let ddof default to 0? The documentation elaborates on this:

In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even with ddof=1, it will not be an unbiased estimate of the standard deviation per se.

They do the same for the variance, so in both environments one has that the variance is the squared standard deviation.

Both approaches are okay, one just has to keep in mind what the underlying assumptions or goals were. With enough data there is not much of a difference anyway.