Efficiency of the `pow` function
Someone said that using pow(x, 2)
is always more inefficient than using
x * x
. Well, there are two things to remember:
- Do not "optimize" without measurement.
- Measure with full compiler optimization.
So this is exactly what I did then. This is a simple C++11 program that uses
pow(x, 2)
and x * x
. I highlighted the lines in questions. The calculations
with test
are in place so that the compiler does not optimize away the code,
which it would do otherwise.
#include <chrono> #include <cmath> #include <iostream> int main() { double test {0}; unsigned iter_max {100000000}; auto start_time = std::chrono::steady_clock::now(); for (unsigned iter {0}; iter < iter_max; iter++) { test += std::pow(iter, 2); } double time_in_seconds = std::chrono::duration_cast<std::chrono::milliseconds> (std::chrono::steady_clock::now() - start_time).count() / 1000.0; std::cout << "Pow: " << time_in_seconds << std::endl; start_time = std::chrono::steady_clock::now(); for (unsigned iter {0}; iter < iter_max; iter++) { test += iter * iter; } time_in_seconds = std::chrono::duration_cast<std::chrono::milliseconds> (std::chrono::steady_clock::now() - start_time).count() / 1000.0; std::cout << "Multiplication: " << time_in_seconds << std::endl; std::cout << test << std::endl; return 0; }
Results
If you compile that without optimization, pow
is clearly slower:
$ clang++ -std=c++11 pow.cpp -o pow; and ./pow Pow: 0.272 Multiplication: 0.042
Now I compiled this with clang++
using its -O3
optimization. When I did
this on 2014-05-21, pow
was significantly faster. I have revisited this on
2014-07-10, where pow
just a tiny bit slower than the multiplication.
Interesting.
I also tested with g++
and found that pow
is significantly slower than the
multiplication. To be fair, I ran each one a couple times since the overall
time varies. With g++
, the multiplication is actually faster. To get
meaningful results, I ran each one 10 times and too mean and standard deviation
with this Python script:
#!/usr/bin/python3 # -*- coding: utf-8 -*- import subprocess import numpy import unitprint def compile_cpp(compiler): subprocess.check_call([compiler, '-std=c++11', '-O3', 'pow.cpp', '-o', 'pow']) def get_results(): words = subprocess.check_output(['./pow']).decode().strip().split() return float(words[1]), float(words[3]) def bootstrap(compiler, runs=10): compile_cpp(compiler) times_pow = [] times_mul = [] for i in range(runs): time_pow, time_mul = get_results() times_pow.append(time_pow) times_mul.append(time_mul) mean_pow = numpy.mean(times_pow) mean_mul = numpy.mean(times_mul) std_pow = numpy.std(times_pow) std_mul = numpy.std(times_mul) return unitprint.siunitx(mean_pow, std_pow), \ unitprint.siunitx(mean_mul, std_mul) def main(): for compiler in ['g++', 'clang++']: print(compiler, *bootstrap(compiler)) if __name__ == "__main__": main()
These are the results:
Compiler |
pow / s |
x * x / s |
---|---|---|
g++ | 2.90 ± 0.03 | 1.28 ± 0.01 |
clang++ | 1.272 ± 0.008 | 1.268 ± 0.008 |
With clang++
, there is a tiny difference between pow
and multiplication, it
is not really significant though, since it is just half a standard deviation.
g++
takes more than twice as long for using pow
. I can absolutely
understand that people using g++
will try to avoid the pow
function.
However, and that is my point, the statement that pow
is always slower than
multiplication does not really hold. I consider the results from clang++
to
be on par. Please test your application on your compiler and see what is faster
in reality, not in theory.
Source of pow
And if you look at the
source
of pow()
, you will see that they have thought about it:
112 /* First see whether `y' is a natural number. In this case we 113 can use a more precise algorithm. */
Then it jumps to 9
where it says:
136 9: /* OK, we have an integer value for y. Unless very small 137 (we use < 8), use the algorithm for real exponent to avoid 138 accumulation of errors. */