Efficiency of the `pow` function

Martin Ueding

2014-07-10

Code & Zahlen

Someone said that using pow(x, 2) is always more inefficient than using x * x. Well, there are two things to remember:

Do not "optimize" without measurement.
Measure with full compiler optimization.

So this is exactly what I did then. This is a simple C++11 program that uses pow(x, 2) and x * x. I highlighted the lines in questions. The calculations with test are in place so that the compiler does not optimize away the code, which it would do otherwise.

#include <chrono>
#include <cmath>
#include <iostream>

int main() {
    double test {0};
    unsigned iter_max {100000000};

    auto start_time = std::chrono::steady_clock::now();
    for (unsigned iter {0}; iter < iter_max; iter++) {
        test += std::pow(iter, 2);
    }
    double time_in_seconds = std::chrono::duration_cast<std::chrono::milliseconds>
        (std::chrono::steady_clock::now() - start_time).count() / 1000.0;
    std::cout << "Pow: " << time_in_seconds << std::endl;

    start_time = std::chrono::steady_clock::now();
    for (unsigned iter {0}; iter < iter_max; iter++) {
        test += iter * iter;
    }
    time_in_seconds = std::chrono::duration_cast<std::chrono::milliseconds>
        (std::chrono::steady_clock::now() - start_time).count() / 1000.0;
    std::cout << "Multiplication: " << time_in_seconds << std::endl;

    std::cout << test << std::endl;

    return 0;
}

Results

If you compile that without optimization, pow is clearly slower:

$ clang++ -std=c++11 pow.cpp -o pow; and ./pow
Pow: 0.272
Multiplication: 0.042

Now I compiled this with clang++ using its -O3 optimization. When I did this on 2014-05-21, pow was significantly faster. I have revisited this on 2014-07-10, where pow just a tiny bit slower than the multiplication. Interesting.

I also tested with g++ and found that pow is significantly slower than the multiplication. To be fair, I ran each one a couple times since the overall time varies. With g++, the multiplication is actually faster. To get meaningful results, I ran each one 10 times and too mean and standard deviation with this Python script:

#!/usr/bin/python3
# -*- coding: utf-8 -*-

import subprocess

import numpy
import unitprint

def compile_cpp(compiler):
    subprocess.check_call([compiler, '-std=c++11', '-O3', 'pow.cpp', '-o', 'pow'])

def get_results():
    words = subprocess.check_output(['./pow']).decode().strip().split()
    return float(words[1]), float(words[3])

def bootstrap(compiler, runs=10):
    compile_cpp(compiler)
    times_pow = []
    times_mul = []
    for i in range(runs):
        time_pow, time_mul = get_results()
        times_pow.append(time_pow)
        times_mul.append(time_mul)

    mean_pow = numpy.mean(times_pow)
    mean_mul = numpy.mean(times_mul)

    std_pow = numpy.std(times_pow)
    std_mul = numpy.std(times_mul)

    return unitprint.siunitx(mean_pow, std_pow), \
            unitprint.siunitx(mean_mul, std_mul)

def main():
    for compiler in ['g++', 'clang++']:
        print(compiler, *bootstrap(compiler))

if __name__ == "__main__":
    main()

These are the results:

Compiler	`pow` / s	`x * x` / s
g++	2.90 ± 0.03	1.28 ± 0.01
clang++	1.272 ± 0.008	1.268 ± 0.008

With clang++, there is a tiny difference between pow and multiplication, it is not really significant though, since it is just half a standard deviation. g++ takes more than twice as long for using pow. I can absolutely understand that people using g++ will try to avoid the pow function.

However, and that is my point, the statement that pow is always slower than multiplication does not really hold. I consider the results from clang++ to be on par. Please test your application on your compiler and see what is faster in reality, not in theory.

Source of `pow`

And if you look at the source of pow(), you will see that they have thought about it:

112         /* First see whether `y' is a natural number.  In this case we
113            can use a more precise algorithm.  */

Then it jumps to 9 where it says:

136 9:      /* OK, we have an integer value for y.  Unless very small
137            (we use < 8), use the algorithm for real exponent to avoid
138            accumulation of errors.  */

Results

Source of pow

Source of `pow`