Python, Rust and C++ all support procedural and functional programming style. Turns out in the native languages, the compiler can optimize the functional stuff and yield similar or the even better performance. In Python, however, the functional programming style makes it even slower.
While it was baking away at 31 °C outside, I’ve sat inside and played around a bit more with Rust. Implementing the reusable generator was fun, so I just continued with Project Euler. After a few uneventful ports of my Python solutions to Rust, I came across Project Euler Solution 8: Largest product in a series. And there was another interesting thing.
One is given a long string of digits. One shall find the largest product of 13 consecutive digits that can be found.
Basic Python solution
My old Python solution is essentially this here:
DIGIT_STRING = "".join(
"""
73167176531330624919225119674426574742355349194934
96983520312774506326239578318016984801869478851843
85861560789112949495459501737958331952853208805511
12540698747158523863050715693290963295227443043557
66896648950445244523161731856403098711121722383113
62229893423380308135336276614282806444486645238749
30358907296290491560440772390713810515859307960866
70172427121883998797908792274921901699720888093776
65727333001053367881220235421809751254540594752243
52584907711670556013604839586446706324415722155397
53697817977846174064955149290862569321978468622482
83972241375657056057490261407972968652414535100474
82166370484403199890008895243450658541227588666881
16427171479924442928230863465674813919123162824586
17866458359124566529476545682848912883142607690042
24219022671055626321111109370544217506941658960408
07198403850962455444362981230987879927244284909188
84580156166097919133875499200524063689912560717606
05886116467109405077541002256983155200055935729725
71636269561882670428252483600823257530420752963450
""".strip().split()
)
NUM_DIGITS = 13
def solution_procedural() -> int:
digits = [int(c) for c in DIGIT_STRING]
largest = 0
for start in range(len(digits) - NUM_DIGITS):
product = 1
for char in digits[start : start + NUM_DIGITS]:
product *= char
largest = max(largest, product)
return largest
First I convert the digit string into a list[int]. Then I iterate through all the windows that we can look at, then multiply the digits within that window and check whether that product is larger than before.
This works and should be reasonable to understand.
Rust iterators and functional programming
Then I tended to Rust. There I learned about iterators and how one can do many cool functional things with that. After having learned Haskell in 2017 just to learn more functional patterns, I started to like them. So I was quite happy that Rust would allow one to write this:
const DIGIT_STRING: &'static str = "7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450";
const NUM_DIGITS: usize = 13;
pub fn solution_8_functional_bytes() -> i64 {
let digits: Vec<u8> = DIGIT_STRING.bytes().map(|char| char - b'0').collect();
(0..digits.len() - NUM_DIGITS)
.map(|start| {
digits[start..start + NUM_DIGITS]
.iter()
.fold(1 as i64, |a, &b| a * (b as i64))
})
.max()
.unwrap_or_default()
}
We can take that string of digits and take the bytes (and not the characters). Then use the ASCII logic to extract the digit. We collect this iterator to materialize the values.
In a second step we can take the range of windows, map another closure onto it, which then takes the digits in the window and multiplies all of them together using the fold operation. From all the windows, we take the maximum value. And in case there was not a window to begin with, we just get the default i64, which is 0.
I love how functional this code is. If the string is too short for our windows, we just get 0 as a result. Should a window be empty, it will have a product value of 1 (which might not be exactly what we’d want).
Functional in C++
I wondered how that would look like in C++. With C++20 and C++23, there are many new ranges and view, I just have never looked at these before. So with a little help from AI, I got this beast:
#include <algorithm>
#include <cstring>
#include <iostream>
#include <numeric>
#include <ranges>
#include <vector>
static char const* const DIGIT_STRING =
"73167176531330624919225119674426574742355349194934969835203127745063262395"
"78318016984801869478851843858615607891129494954595017379583319528532088055"
"11125406987471585238630507156932909632952274430435576689664895044524452316"
"17318564030987111217223831136222989342338030813533627661428280644448664523"
"87493035890729629049156044077239071381051585930796086670172427121883998797"
"90879227492190169972088809377665727333001053367881220235421809751254540594"
"75224352584907711670556013604839586446706324415722155397536978179778461740"
"64955149290862569321978468622482839722413756570560574902614079729686524145"
"35100474821663704844031998900088952434506585412275886668811642717147992444"
"29282308634656748139191231628245861786645835912456652947654568284891288314"
"26076900422421902267105562632111110937054421750694165896040807198403850962"
"45544436298123098787992724428490918884580156166097919133875499200524063689"
"91256071760605886116467109405077541002256983155200055935729725716362695618"
"82670428252483600823257530420752963450";
static constexpr int NUM_DIGITS = 13;
std::vector<int> get_digits() {
std::vector<int> result;
auto const len = std::strlen(DIGIT_STRING);
result.reserve(len);
for (auto i = 0; i < len; ++i) {
result.push_back(DIGIT_STRING[i] - '0');
}
return result;
}
int64_t solution_8_ranges() {
auto const digits = get_digits();
auto const products = digits | std::views::slide(NUM_DIGITS) |
std::views::transform([](auto window) {
return std::ranges::fold_left(
window, 1LL, std::multiplies<int64_t>());
});
return std::ranges::fold_left(
products, 0LL, [](int64_t a, int64_t b) { return std::max(a, b); });
}
The conversion of the string to a vector of integers is still classic C++, in some sense even C. But the actual solution is just madness, I think. We take the vector of digits. Then we apply a sliding view which is an iterator over all windows of given size. Then we transform each of these windows by applying a closure. In the closure we take all the values in the window and multiply them with the fold operation. The result of this is a vector with the products. We extract the maximum from that.
I can read this code, I couldn’t write it myself because I am pretty out of the C++ game at this point. But it is not that hard, I think.
Functional in Python
But of course we can also do an unreadable functional implementation in Python as well.
import functools
def solution_functional() -> int:
digits = list(map(int, DIGIT_STRING))
return max(
map(
lambda start: functools.reduce(
lambda a, b: a * b, digits[start : start + NUM_DIGITS]
),
range(len(digits) - NUM_DIGITS),
)
)
When you import functools, you know that things are going to get serious. We don’t have the fancy sliding window view here, but the same reductions as before. One can see how Python uses global functions like list() or max() whereas Rust uses .collect() and .max() on an iterator. I think I like the method variant better because one easier read long chains.
Procedural Rust implementation
For completeness, here’s a procedural implementation in Rust:
pub fn solution_8_procedural_bytes() -> i64 {
let mut digits: Vec<u8> = Vec::new();
for char in DIGIT_STRING.bytes() {
digits.push(char - b'0');
}
let mut largest: i64 = 0;
for start in 0..digits.len() - NUM_DIGITS {
let mut product: i64 = 1;
for d in digits[start..start + NUM_DIGITS].iter() {
product *= *d as i64;
}
largest = max(largest, product as i64);
}
largest
}
We need to use a lot of let mut here because we’re using a procedural style. I noticed that I didn’t reserve the length of the vector. When I add digits.reserve(DIGIT_STRING.len()), the timings get worse, so I’ll just drop that.
Procedural C++ version
Finally we need a C++ version that is procedurally written:
int64_t solution_8_procedural() {
auto const digits = get_digits();
int64_t largest = 0;
for (int start = 0; start < digits.size() - NUM_DIGITS; ++start) {
int64_t product = 1;
for (int i = start; i < start + NUM_DIGITS; ++i) {
product *= digits[i];
}
largest = std::max(largest, product);
}
return largest;
}
This structurally looks like the Python version. In the end it is the obvious procedural way to implement this.
Performance benchmarks
Programming languages can be compared along many different axes like developer productivity, whether one likes the style and also the performance. There are interesting correlations between language, style and performance as well.
Let’s have a look at the timings. I let it do 100 iterations and then looked at the quantiles. In some sense we’re also measuring one call to std::chrono::high_resolution_clock::now(), Instant.elapsed().as_secs_f64() or datetime.datetime.now(). This is not an issue within the same language, though. These are all six combinations:
| Language | Style | Minimum | 25 % | Median | 75 % | Maximum | Iterations |
|---|---|---|---|---|---|---|---|
| C++ | procedural | 3.50 µs | 3.71 µs | 3.96 µs | 4.15 µs | 5.61 µs | 100 |
| C++ | functional | 3.55 µs | 3.60 µs | 3.62 µs | 3.68 µs | 4.05 µs | 100 |
| Rust | procedural | 3.86 µs | 3.88 µs | 3.89 µs | 3.92 µs | 4.11 µs | 100 |
| Rust | functional | 2.99 µs | 2.99 µs | 3.04 µs | 3.07 µs | 3.32 µs | 100 |
| Python | procedural | 510 µs | 538 µs | 548 µs | 558 µs | 599 µs | 100 |
| Python | functional | 1.01 ms | 1.04 ms | 1.07 ms | 1.08 ms | 1.26 ms | 100 |
One should primarily look at the mininum timings, all higher timings are just noise. The two C++ versions have pretty much the same performance. This means that the C++ compiler that I’ve used (g++ 16.1.1) is clever enough to produce machine instructions that have the same performance. That is such an amazing engineering feat, simplifying all these abstractions down to the procedural CPU instructions.
In C++, it doesn’t matter for performance which style one takes. One is free to chose styles based on other boundary conditions or preferences.
With Rust, it is very curious. The procedural version in Rust is slower, but the functional way is faster and also faster than the C++ versions! This is very interesting and I didn’t suspect that, to be honest. It might be that this style of writing gives the Rust compiler a lot of constraints that it can optimize with. C++ might have less clear things that doesn’t allow the compiler to go all the way.
The Python versions are dramatically slower. The procedural Python version is a factor 100 slower than the native code versions. That hurts. But for that you don’t need to compile and you write code very quickly. What I find pretty dishartening is that the functional implementation takes twice as long as the procedural one. This really discourages functional programming in the hot path and tight loops. It totally makes sense because Python has to create all these iterator objects at runtime and cannot optimize them away.
Conclusion
So shall one pick functional or procedural style to get the optimal performance? I’d say that there is no reason to avoid the functional style for a fear of inferior performance, it is actually quite the opposite! In C++ it literally didnt’t make a difference. In Rust, we could see an advantage of using the functional style. Both are close enough that it is native performance, so either is still much better than Python. And Python is a factor 100 slower, so it doesn’t matter if functional programming makes that a factor 200. By chosing Python, one has already decided against performance anyway.
But again, choice of language is more than performance with a few toy problems. It is about the ecosystem of avilable library packages, how easy it is for the actual developers in question to use and whether one likes the language.