While learning Rust, I ported my prime number generator from Python to both C++ and Rust. It is curious to see how the different language allow different ways of writing it and also looking into the performance of it.

These days I let AI write virtually of my code at work. These tools have gotten good enough that it actually makes sense to do that. I do struggle with the loss of hand-writing code occasionally. It certainly makes sense to automate the generation of boring code. And at work it can also make sense to have it generate the interesting code as well and just verify it. What’s lacking is the need to learn and understand new language features or new ways of working.

Another change in my industry has been the drift from C++ and CUDA to Python. Machine learning libraries like TensorFlow are written in Python, the hard stuff is C++ and CUDA below that. In order to write applications, it is sufficient to use the Python library. For high performance things, one still needed to go below that. Currently, there is such a big demand for LLM inference that there are already three major frameworks out there. I don’t need to write C++ or CUDA, I can just string together some Python libraries and Torch Compile will do the magic in the background.

I have quite some experience with C++ and wanted to connect to that. So I took Project Euler problem 3 that I had implemented in Python already and set out to implement it in C++. When I was done with that, I realized how cluky the C++ build system is and thought about learning this as an opportunity to learn Rust. But let me tell you one thing at a time.

Prime generator

The Project Euler problem can be solved well with a list of prime numbers. I just don’t know how many primes I need in advance. So what I really want is a way to generate primes as I need them.

In Python, the generator concept is just ideal for that. I can just define a function and use the keyword yield and it becomes a generator that yields an iterator when called. The algorithm is simple:

  • Iterate through all candidate numbers starting from 2.
  • For the current candidate, check whether it is divisble by any already known prime number that is smaller than the square root of the candidate.
  • If the candidate is not divisible by any of the known primes, it must be a prime itself.

This is my implementation:

from collections.abc import Iterator
import itertools


def prime_generator() -> Iterator[int]:
    primes = []
    for candidate in itertools.count(2):
        is_prime = True
        for prime in primes:
            if prime * prime > candidate:
                break
            if candidate % prime == 0:
                is_prime = False
                break
        if is_prime:
            yield candidate
            primes.append(candidate)

And now one can simply write the following:

for prime in prime_generator():
    ...

This is very nice because I can concentrate on my actual problem and just have the prime numbers available to me.

Here you can see how elegant Python code can be written and how the language with its built-in generators makes this nice.

Also note that this is an infinite generator. Nowhere the end is specified. Therefore one need to call break in the above for-loop at some point. As long as one includes that, it is fine.

Reusable

There are problems that need to iterate through primes multiple times. Therefore it would be nice if we can somehow cache the primes that we have computed already. There is an easy trick to do that by using a mutable default argument like so:

def prime_generator(_primes=[]) -> Iterator[int]:
    yield from _primes
    start = 2 if not _primes else _primes[-1] + 1
    for candidate in itertools.count(start):
        is_prime = True
        for prime in _primes:
            if prime * prime > candidate:
                break
            if candidate % prime == 0:
                is_prime = False
                break
        if is_prime:
            yield candidate
            _primes.append(candidate)

First we yield all the already known primes, after that we start to compute the next ones and add them to the list. The caller doesn’t see this. They can suppress the cache by using prime_generator([]) to make it start from an empty list. That doesn’t reset the global cache, though. This is bad for benchmarking and testing.

So what I actually prefer to use in this case is an explicit class. The lifetime of the cache is now bound to that object and we can control that. Thanks to magic methods we can write this with just three more lines of code:

class PrimeList:
    def __init__(self) -> None:
        self._primes = []

    def __iter__(self) -> Iterator[int]:
        yield from self._primes
        start = 2 if not self._primes else self._primes[-1] + 1
        for candidate in itertools.count(start):
            is_prime = True
            for prime in self._primes:
                if prime * prime > candidate:
                    break
                if candidate % prime == 0:
                    is_prime = False
                    break
            if is_prime:
                yield candidate
                self._primes.append(candidate)

The usage can now either be done single-use like this:

for prime in PrimeList():
    ...

Or we can explicitly create the object and reuse that:

primes = PrimeList():
for prime in primes:
    ...
for prime in primes:
    ...

So I think that’s pretty elegant.

Explicit C++ version

Admittedly my C++ knowledge has somehow frozen with the 2017 standard. I have looked at a list of features of the 2020 standard but haven’t really used stuff like concepts, modules and reflection. We already have the 2023 and 2026 standards but as I don’t use it on a daily basis, I haven’t kept up.

So with what I got in my head, I started out writing the class. And then there were so many C++ things that hit me. C++17 doesn’t have generators, so one needs to write this oneself. In order to use for (auto prime : PrimeGenerator()), one needs to specify begin() and end(), an iterator type with operator*, operator++ and operator!=. The iterator needs to have a reference to the generator/list, and we run into cyclic dependency issues.

In the header file, I have the following code:

#pragma once

#include <vector>

class PrimeGenerator;

class PrimeIterator {
   public:
    PrimeIterator(PrimeGenerator& prime_generator, size_t const index)
        : prime_generator(prime_generator), index(index) {}

    int operator*();

    PrimeIterator operator++();

    bool operator!=(PrimeIterator const& other) { return index != other.index; }

   private:
    PrimeGenerator& prime_generator;
    size_t index = 0;
};

class PrimeGenerator {
   public:
    PrimeGenerator() {}

    PrimeIterator begin() { return PrimeIterator(*this, 0); }
    PrimeIterator end() { return PrimeIterator(*this, -1); }

    int get(size_t index) {
        advance(index);
        return primes[index];
    }

   private:
    void advance(size_t const index);

    std::vector<int> primes;
};

The PrimeGenerator is the first class that we should look at. It has a trivial constructor, the begin() and end() that produce a PrimeIterator instance. We have a get() method that will get the prime with the given index. As you can see there, it will compute the primes up to that index and then return the desired prime. We have the advance() method and a vector to store the computed primes.

The PrimeIterator class has the necessary operators. It keeps a reference to the PrimeGenerator it works with as well as its index.

In the source file, we then have some more method definitions:

#include "primes.hpp"

void PrimeGenerator::advance(size_t const index) {
    if (primes.size() > index) {
        return;
    }
    if (primes.empty()) {
        primes.push_back(2);
    }
    auto candidate = primes.back() + 1;
    while (primes.size() <= index) {
        bool is_prime = true;
        for (auto prime : primes) {
            if (prime * prime > candidate) {
                break;
            }
            if (candidate % prime == 0) {
                is_prime = false;
                break;
            }
        }
        if (is_prime) {
            primes.push_back(candidate);
        }
        candidate += 1;
    }
}

int PrimeIterator::operator*() { return prime_generator.get(index); }

PrimeIterator PrimeIterator::operator++() {
    ++index;
    return *this;
}

The advance() method is the core algorithm that we’ve seen before. It will check whether we need to compute anything at all. Then it will run the same algorithm, just written out in C++ syntax. The operator*() is for accessing the element that the iterator points to. This just defers to the PrimeGenerator.get() method. This method definition needs to be in the source file because in the header I only have a forward declaration of PrimeGenerator. operator++() just advances to the next element, so that’s a trivial method.

When one compares that to the Python version, it is horribly long and convoluted. Most of the code is boilerplate, the actual logic almost gets lost in all that structure. This is what I dislike about C++.

Performance

What I do like about C++ is the performance. I have compiled this with g++ 16.1.1 and -O3 -DNDEBUG but without specifying the architecture of my system. Python is 3.14. I run this on an Intel Core i5-10210U with dynamic clock rate, so not the ideal benchmark setup. Anyway, these are the timings that I get:

Implementation Minimum 25 % Median 75 % Maximum Iterations
C++ PrimeGenerator 137 µs 141 µs 143 µs 146 µs 184 µs 100
Python PrimeList 2.05 ms 2.09 ms 2.22 ms 2.60 ms 3.43 ms 100

There is some spread, the first iteration likely is the slowest. But when we look at the median, we see a factor 15 between the Python and C++ implementation. That’s not as bad as it sometimes gets, but one can see how the native code is much faster.

C++23 generator variant

Because I am so used to AI development from work, I did ask Anthropic’s Claude Sonnet 4.6 about this. And it had an interesting suggestion, namely that C++23 has generators. This allows to write it in a much shorter way:

#include <algorithm>
#include <ranges>
#include <vector>

std::generator<long long> primes() {
    std::vector<long long> known;
    for (long long candidate = 2;; ++candidate) {
        bool is_prime =
            std::ranges::none_of(known | std::views::take_while([&](auto p) {
                                     return p * p <= candidate;
                                 }),
                                 [&](auto p) { return candidate % p == 0; });
        if (is_prime) {
            known.push_back(candidate);
            co_yield candidate;
        }
    }
}

It has also made heavy use of the ranges and algorithms that are there since C++20. The whole code read much more functional than my procedural version before.

At this point, this code doesn’t cache. One can declare known with static and it will cache that. But just like the Python version with the mutable default argument, we lose control over the cache when running a benchmark, so I didn’t include that and only tested it in a problem where we don’t need to reuse it.

What we can see here is that the timings are worse than before:

Implementation Minimum 25 % Median 75 % Maximum Iterations
C++ PrimeGenerator 137 µs 141 µs 143 µs 146 µs 184 µs 100
C++ primes() 283 µs 284 µs 290 µs 294 µs 366 µs 100
Python PrimeList 2.05 ms 2.09 ms 2.22 ms 2.60 ms 3.43 ms 100

So it seems that all the coroutine handling has quite some cost that incur. It could also be the ranges and views, though I’d hope that the compile is able to alide the complexity that they introduce.

The intermediate conclusion here is that in C++ we have the option to write it very explicitly and it will be pretty fast. We can also use fancy new language features to write it as a generator/coroutine, yet there is a performance cost to it. Python offers generators/coroutines for a long a time, and I never think of performance overheads there. The total performance is worse anyway, so I wouldn’t even attempt to performance tune on this level and just do it on the algorithmical level. The C++ generator version is still much faster than the Python implementation, so in absolute numbers it is not bad at all. It is just that I know that it can be computed faster with more verbose code.

Rust

I am still new to Rust, so I needed a lot of documentation reading and also used an AI to ask many questions until I understood what’s going on. Because I don’t only want a simple iterator but one with persistent state, I need to split it into two classes like in C++. There is a PrimeGenerator and and a PrimeIterator like in C++. They just require less lines of code.

pub struct PrimeGenerator {
    primes: Vec<i64>,
}

impl PrimeGenerator {
    pub fn new() -> Self {
        PrimeGenerator { primes: vec![2] }
    }

    pub fn iter(&mut self) -> PrimeIterator<'_> {
        PrimeIterator {
            primes: &mut self.primes,
            index: 0,
        }
    }
}

pub struct PrimeIterator<'a> {
    primes: &'a mut Vec<i64>,
    index: usize,
}

impl<'a> Iterator for PrimeIterator<'a> {
    type Item = i64;

    fn next(&mut self) -> Option<Self::Item> {
        if self.index < self.primes.len() {
            let result = self.primes[self.index];
            self.index += 1;
            return Some(result);
        } else {
            let mut candidate = self.primes.last()? + 1;

            loop {
                let mut is_prime = true;
                for prime in self.primes.iter() {
                    if prime * prime > candidate {
                        break;
                    }
                    if candidate % prime == 0 {
                        is_prime = false;
                        break;
                    }
                }
                if is_prime {
                    self.primes.push(candidate);
                    self.index += 1;
                    return Some(candidate);
                }
                candidate += 1;
            }
        }
    }
}

The code is more compact than the C++ one, but not as neat as the Python one. I am still struggling a bit with the explicit lifetimes, but it does run so it must be fine for now.

Curiously, the Rust version is even faster than the C++ version:

Implementation Minimum 25 % Median 75 % Maximum Iterations
Rust 116 µs 117 µs 118 µs 121 µs 145 µs 100
C++ PrimeGenerator 137 µs 141 µs 143 µs 146 µs 184 µs 100
C++ primes() 283 µs 284 µs 290 µs 294 µs 366 µs 100
Python PrimeList 2.05 ms 2.09 ms 2.22 ms 2.60 ms 3.43 ms 100

There can be various reasons for that. It could be that the Rust compiler targets my particular machine and is not as conservative as the C++ compiler. It might be that the Rust compile with its borrow checker can more aggressively optimize the reference from the PrimeIterator to the PrimeGenerator that the C++ compiler cannot. It might be due to some slightly different way within the algorithm. It could be slightly differently generated code that biases the branch predictor in the CPU in a different way.

Conclusion

Writing code myself has been fun, perhaps because it was just a little code that I worked with for a few hours. It was pleasing to see how I can reach native level performance with both C++ and Rust. I’ve learned ab bit more Rust along the way, also learned about C++23 generators.

Still the elegance of Python is not matched by either. Even the functional and generator version of C++23 isn’t quite as neat as the Python one due to the more clunky syntax. Perhaps with the cpp2 Syntax it will get better, but I haven’t used that either.

Going forward, I think that I want to do more intential manual coding as a contrast to the AI coding I do for most other projects. Also I want to continue with Rust because I can see use in it to offload heavy computation from Python (just as one can do with C++). Additionally it seems like an interesting skill to have as a high performance computing engineer. And although I love C++ for its power and explicitness, it sure is cumbersome and thorny to use. Just because I have gained a lot of experience managing the thorns, and having mastery over something hard is satisfying, I can also enjoy the less thorny nature of Rust.