qsort vs. std::sort

Update

Before 2016-10-23, this article was about sorting a custom complex type. The data was filled in using a primitive loop, and all the data had the exact same value. This is not a good test for a sorting algorithm, although the results were quite accurate.

Now I have generated random values and read those into the program. This makes sure that the sorting algorithm actually has some work to do. The conclusions still apply, so not much has changed except that the argument now stands on a stronger base.

When you have an array with some data structure that you would like to sort, you are looking for a sorting algorithm in the standard library.

In C, you have the qsort function which has a pretty cumbersome declaration:

void qsort(void *base, size_t nmemb, size_t size,
           int (*compar)(const void *, const void *));

So you have to pass a pointer to your container, pass the number of elements in the container, pass the size of each element and a comparator function.

Bjarne Stroustrup asked the questions which seem natural when you are used to C++ in a talk about C++11:

  • Why doesn’t the container know its size?
  • Why doesn’t the container know the size of its elements?
  • Why don’t the elements know their order, the way to compare them to each other?

Additionally, the void * in the comparator function mean that type safety cannot be enforced by the compiler, which is bad.

Since the function is passed as a pointer, and the elements are passed as pointers as well, it means that each comparison costs three indirections and a function call. In C++, the std::sort is a template algorithm, so that it can be compiled once for each type. The operator< of the type is usually baked into the sort algorithm as well (inlining), reducing the cost of the comparison significantly.

As Stroustrup pointed out in his talk, people sometimes think that qsort must be faster, since it is C, which is supposed to be the fastest language you can get. The fact that the declaration of qsort is cumbersome tells those people that it is even faster.

Test programs

I want to sort a container of simple floating points numbers. That way, the actual comparison is a really fast operation and the overhead will be seen dramatically.

In C, this is implemented using a dynamic array, a comparison function and the qsort function. See the emphasized lines for the actual code, everything else is just for the test setup.

// Copyright © 2014, 2016 Martin Ueding <dev@martin-ueding.de>
// Licensed under the MIT license

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int compare(void const *this, void const *other) {
    double const a = *((const double *)this);
    double const b = *((const double *)other);
    if (a < b)
        return -1;
    else if (a == b)
        return 0;
    else
        return 1;
}

void time_sorting(char const *const filename, size_t const size) {
    double *data = malloc(size * sizeof(*data));
    double *orig = malloc(size * sizeof(*orig));

    FILE *fp = fopen(filename, "r");
    for (size_t i = 0; i < size; ++i) {
        fscanf(fp, "%lf\n", orig + i);
    }

    printf("%lu", size);
    for (int i = 0; i < 5; ++i) {
        memcpy(data, orig, size * sizeof(*orig));

        clock_t const start = clock();
        qsort(data, size, sizeof(*data), compare);
        clock_t const end = clock();

        double const duration = (end - start) / ((double)CLOCKS_PER_SEC);
        printf("\t%g", duration);
    }
    printf("\n");

    free(data);
    free(orig);
}

int main() {
    time_sorting("data-10.txt", 10);
    time_sorting("data-100.txt", 100);
    time_sorting("data-1000.txt", 1000);
    time_sorting("data-10000.txt", 10000);
    time_sorting("data-100000.txt", 100000);
    time_sorting("data-1000000.txt", 1000000);
    time_sorting("data-10000000.txt", 10000000);
}

I have generated files with random numbers in them such that the sorting would have some actual work to do.

You can see that the information regarding the container and its contents are split across several blocks. You have the size, the data and the compare function.

In C++, I use that there exists an operator< for int and the std::sort function:

// Copyright © 2014, 2016 Martin Ueding <dev@martin-ueding.de>
// Licensed under the MIT license

#include <algorithm>
#include <ctime>
#include <fstream>
#include <iostream>
#include <vector>

void time_sorting(std::string const &filename, size_t const size) {
    std::vector<double> orig;
    orig.reserve(size);

    std::ifstream ifs(filename);
    for (size_t i = 0; i < size; ++i) {
        double in;
        ifs >> in;
        orig.push_back(in);
    }

    std::cout << size;
    for (int i = 0; i < 5; ++i) {
        auto data = orig;
        clock_t const start = clock();
        std::sort(std::begin(data), std::end(data));
        clock_t const end = clock();

        double const duration = (end - start) / ((double)CLOCKS_PER_SEC);
        std::cout << "\t" << duration;
    }
    std::cout << "\n";
}

int main() {
    time_sorting("data-10.txt", 10);
    time_sorting("data-100.txt", 100);
    time_sorting("data-1000.txt", 1000);
    time_sorting("data-10000.txt", 10000);
    time_sorting("data-100000.txt", 100000);
    time_sorting("data-1000000.txt", 1000000);
    time_sorting("data-10000000.txt", 10000000);
}

I find that the C++ version is very clear on intent. This is a thing where I prefer C++ over C.

This is compiled with a sufficiently new standard as well as full optimization:

#!/bin/bash
# Copyright © 2014, 2016 Martin Ueding <dev@martin-ueding.de>
# Licensed under the MIT license

set -e
set -u
set -x

options="-Wall -Wpedantic -O3 -march=native"

gcc $options -std=c99 -o sort-gcc sort.c
g++ $options -std=c++11 -o sort-g++ sort.cpp
clang $options -std=c99 -o sort-clang sort.c
clang++ $options -std=c++11 -o sort-clang++ sort.cpp

for program in sort-gcc sort-g++ sort-clang sort-clang++
do
    ./$program > $program.txt
done

You can download all the files that I have used in this test here:

Results

The sorting is repeated from 10 numbers to 10 million numbers. The C and C++ programs translated with the GCC and LLVM compilers scale like the following:

../../_images/time.svg

Download: time.pdf

It is very curious to see that for small lengths, the GCC versions are faster then the LLVM versions; the language makes the smaller difference! For more items, the C++ version is faster with both compilers.

Taking the appropriate ratios, one can compare GCC with LLVM and also C with C++ here:

../../_images/Ratios.svg

Download: Ratios.pdf

For small data sets, one cannot really say anything, the statistical fluctuations are just too large. For large data sets, there is not much of a difference between GCC and LLVM. There is, however, a difference of a factor 1.9 between C and C++, the C version is slower than the C++ version.

Conclusion

The main point is that the C++ version is way faster than the C version, although the C++ version is more elegant. With C++, you can express your intentions more clearly, helping the compiler to optimize your program. Trust your optimizer!