# Efficiency of the pow function

Someone said that using pow(x, 2) is always more inefficient than using x * x. Well, there are two things to remember:

1. Do not "optimize" without measurement.
2. Measure with full compiler optimization.

So this is exactly what I did then. This is a simple C++11 program that uses pow(x, 2) and x * x. I highlighted the lines in questions. The calculations with test are in place so that the compiler does not optimize away the code, which it would do otherwise.

#include <chrono>
#include <cmath>
#include <iostream>

int main() {
double test {0};
unsigned iter_max {100000000};

for (unsigned iter {0}; iter < iter_max; iter++) {
test += std::pow(iter, 2);
}
double time_in_seconds = std::chrono::duration_cast<std::chrono::milliseconds>
std::cout << "Pow: " << time_in_seconds << std::endl;

for (unsigned iter {0}; iter < iter_max; iter++) {
test += iter * iter;
}
time_in_seconds = std::chrono::duration_cast<std::chrono::milliseconds>
std::cout << "Multiplication: " << time_in_seconds << std::endl;

std::cout << test << std::endl;

return 0;
}


## Results

If you compile that without optimization, pow is clearly slower:

\$ clang++ -std=c++11 pow.cpp -o pow; and ./pow
Pow: 0.272
Multiplication: 0.042


Now I compiled this with clang++ using its -O3 optimization. When I did this on 2014-05-21, pow was significantly faster. I have revisited this on 2014-07-10, where pow just a tiny bit slower than the multiplication. Interesting.

I also tested with g++ and found that pow is significantly slower than the multiplication. To be fair, I ran each one a couple times since the overall time varies. With g++, the multiplication is actually faster. To get meaningful results, I ran each one 10 times and too mean and standard deviation with this Python script:

#!/usr/bin/python3
# -*- coding: utf-8 -*-

import subprocess

import numpy
import unitprint

def compile_cpp(compiler):
subprocess.check_call([compiler, '-std=c++11', '-O3', 'pow.cpp', '-o', 'pow'])

def get_results():
words = subprocess.check_output(['./pow']).decode().strip().split()
return float(words[1]), float(words[3])

def bootstrap(compiler, runs=10):
compile_cpp(compiler)
times_pow = []
times_mul = []
for i in range(runs):
time_pow, time_mul = get_results()
times_pow.append(time_pow)
times_mul.append(time_mul)

mean_pow = numpy.mean(times_pow)
mean_mul = numpy.mean(times_mul)

std_pow = numpy.std(times_pow)
std_mul = numpy.std(times_mul)

return unitprint.siunitx(mean_pow, std_pow), \
unitprint.siunitx(mean_mul, std_mul)

def main():
for compiler in ['g++', 'clang++']:
print(compiler, *bootstrap(compiler))

if __name__ == "__main__":
main()


These are the results:

Compiler pow / s x * x / s
g++ 2.90 ± 0.03 1.28 ± 0.01
clang++ 1.272 ± 0.008 1.268 ± 0.008

With clang++, there is a tiny difference between pow and multiplication, it is not really significant though, since it is just half a standard deviation. g++ takes more than twice as long for using pow. I can absolutely understand that people using g++ will try to avoid the pow function.

However, and that is my point, the statement that pow is always slower than multiplication does not really hold. I consider the results from clang++ to be on par. Please test your application on your compiler and see what is faster in reality, not in theory.

## Source of pow

And if you look at the source of pow(), you will see that they have thought about it:

112         /* First see whether y' is a natural number.  In this case we
113            can use a more precise algorithm.  */


Then it jumps to 9 where it says:

136 9:      /* OK, we have an integer value for y.  Unless very small
137            (we use < 8), use the algorithm for real exponent to avoid
138            accumulation of errors.  */
`