Efficiency of the `pow` function
Someone said that using pow(x, 2)
is always more inefficient than using
x * x
. Well, there are two things to remember:
- Do not "optimize" without measurement.
- Measure with full compiler optimization.
So this is exactly what I did then. This is a simple C++11 program that uses
pow(x, 2)
and x * x
. I highlighted the lines in questions. The calculations
with test
are in place so that the compiler does not optimize away the code,
which it would do otherwise.
#include <chrono>
#include <cmath>
#include <iostream>
int main() {
double test {0};
unsigned iter_max {100000000};
auto start_time = std::chrono::steady_clock::now();
for (unsigned iter {0}; iter < iter_max; iter++) {
test += std::pow(iter, 2);
}
double time_in_seconds = std::chrono::duration_cast<std::chrono::milliseconds>
(std::chrono::steady_clock::now() - start_time).count() / 1000.0;
std::cout << "Pow: " << time_in_seconds << std::endl;
start_time = std::chrono::steady_clock::now();
for (unsigned iter {0}; iter < iter_max; iter++) {
test += iter * iter;
}
time_in_seconds = std::chrono::duration_cast<std::chrono::milliseconds>
(std::chrono::steady_clock::now() - start_time).count() / 1000.0;
std::cout << "Multiplication: " << time_in_seconds << std::endl;
std::cout << test << std::endl;
return 0;
}
Results
If you compile that without optimization, pow
is clearly slower:
$ clang++ -std=c++11 pow.cpp -o pow; and ./pow
Pow: 0.272
Multiplication: 0.042
Now I compiled this with clang++
using its -O3
optimization. When I did
this on 2014-05-21, pow
was significantly faster. I have revisited this on
2014-07-10, where pow
just a tiny bit slower than the multiplication.
Interesting.
I also tested with g++
and found that pow
is significantly slower than the
multiplication. To be fair, I ran each one a couple times since the overall
time varies. With g++
, the multiplication is actually faster. To get
meaningful results, I ran each one 10 times and too mean and standard deviation
with this Python script:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import subprocess
import numpy
import unitprint
def compile_cpp(compiler):
subprocess.check_call([compiler, '-std=c++11', '-O3', 'pow.cpp', '-o', 'pow'])
def get_results():
words = subprocess.check_output(['./pow']).decode().strip().split()
return float(words[1]), float(words[3])
def bootstrap(compiler, runs=10):
compile_cpp(compiler)
times_pow = []
times_mul = []
for i in range(runs):
time_pow, time_mul = get_results()
times_pow.append(time_pow)
times_mul.append(time_mul)
mean_pow = numpy.mean(times_pow)
mean_mul = numpy.mean(times_mul)
std_pow = numpy.std(times_pow)
std_mul = numpy.std(times_mul)
return unitprint.siunitx(mean_pow, std_pow), \
unitprint.siunitx(mean_mul, std_mul)
def main():
for compiler in ['g++', 'clang++']:
print(compiler, *bootstrap(compiler))
if __name__ == "__main__":
main()
These are the results:
Compiler |
pow / s |
x * x / s |
---|---|---|
g++ | 2.90 ± 0.03 | 1.28 ± 0.01 |
clang++ | 1.272 ± 0.008 | 1.268 ± 0.008 |
With clang++
, there is a tiny difference between pow
and multiplication, it
is not really significant though, since it is just half a standard deviation.
g++
takes more than twice as long for using pow
. I can absolutely
understand that people using g++
will try to avoid the pow
function.
However, and that is my point, the statement that pow
is always slower than
multiplication does not really hold. I consider the results from clang++
to
be on par. Please test your application on your compiler and see what is faster
in reality, not in theory.
Source of pow
And if you look at the
source
of pow()
, you will see that they have thought about it:
112 /* First see whether `y' is a natural number. In this case we
113 can use a more precise algorithm. */
Then it jumps to 9
where it says:
136 9: /* OK, we have an integer value for y. Unless very small
137 (we use < 8), use the algorithm for real exponent to avoid
138 accumulation of errors. */