Project Euler Solution 14: Longest Collatz sequence

In today's installment in the Project Euler series we take a look at Problem 14: Longest Collatz sequence. We shall find a number with a long Collatz sequence.

The following iterative sequence is defined for the set of positive integers:

  • n → n/2 (n is even)
  • n → 3n + 1 (n is odd)

Using the rule above and starting with 13, we generate the following sequence: 13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1

It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.

Which starting number, under one million, produces the longest chain?

Note: Once the chain starts the terms are allowed to go above one million.

One can straightforwardly implement this using an iterative approach:

def collatz_length(start: int) -> int:
    steps = 0
    while start != 1:
        if start % 2 == 0:
            start //= 2
        else:
            start = 3 * start + 1
        steps += 1
    return steps


def solution_naive() -> int:
    result = 0
    max_steps = 0
    for start in tqdm(range(1, 1_000_000)):
        steps = collatz_length(start)
        if steps > max_steps:
            max_steps = steps
            result = start
    return result

This produces the correct result of 837,799 but it takes 11 s to finish. Since we revisit a lot of the entries in the sequence, it makes sense to cache them. This time I use an actual cache object such that I don't have to reset the cache in the hackish way via the __defaults__ tuple.

class CollatzCache:
    def __init__(self) -> None:
        self._cache = {}

    def compute_length(self, start: int) -> int:
        steps = 0
        number = start
        while number != 1:
            if number in self._cache:
                self._cache[start] = self._cache[number] + steps
                return self._cache[number] + steps

            if number % 2 == 0:
                number //= 2
            else:
                number = 3 * number + 1
            steps += 1

        self._cache[start] = steps
        return steps


def solution_cached() -> int:
    collatz_cache = CollatzCache()
    result = 0
    max_steps = 0
    for start in tqdm(range(1, 1_000_000)):
        steps = collatz_cache.compute_length(start)
        if steps > max_steps:
            max_steps = steps
            result = start
    return result

This brings the time down to 1.4 s. This is nice. But we only store the values when we hit another value in the cache. Perhaps we could then also directly store the values for the whole trajectory? We can.

class CollatzCache:
    def __init__(self) -> None:
        self._cache = {}

    def compute_length(self, start: int) -> int:
        steps = 0
        number = start
        trajectory = [start]
        while number != 1:
            if number in self._cache:
                self._insert_result(trajectory, self._cache[number])
                return self._cache[start]

            if number % 2 == 0:
                number //= 2
            else:
                number = 3 * number + 1
            trajectory.append(number)
            steps += 1

        self._insert_result(trajectory, 0)
        return steps

    def _insert_result(self, trajectory: list[int], remaining_steps: int) -> None:
        for i, start in enumerate(trajectory):
            self._cache[start] = len(trajectory) - i + remaining_steps

It takes slightly longer to do so, though, namely 1.6 s. This means that the extra work for the cache hasn't really helped.

Recursive approach

We can also implement this in a recursive fashion.

def recursive_collatz_length(start: int) -> int:
    if start == 1:
        return 0
    elif start % 2 == 0:
        return 1 + recursive_collatz_length(start // 2)
    else:
        return 1 + recursive_collatz_length(3 * start + 1)


def solution_recursive() -> int:
    result = 0
    max_steps = 0
    for start in tqdm(range(1, 1_000_000)):
        steps = collatz_length(start)
        if steps > max_steps:
            max_steps = steps
            result = start
    return result

Then it takes 12 s, which isn't surprising due to the lack of a cache. But now we can insert the cache in a bit nicer way. Due to the recursion one doesn't have to fiddle with an additional stack for the trajectory. Instead it just nicely fits in.

class RecursiveCollatzCache:
    def __init__(self) -> None:
        self._cache = {}

    def compute_length(self, start: int) -> int:
        if start in self._cache:
            return self._cache[start]

        if start == 1:
            result = 0
        elif start % 2 == 0:
            result = 1 + self.compute_length(start // 2)
        else:
            result = 1 + self.compute_length(3 * start + 1)

        self._cache[start] = result
        return result


def solution_recursive_with_cache() -> int:
    cache = RecursiveCollatzCache()
    result = 0
    max_steps = 0
    for start in range(1, 1_000_000):
        steps = cache.compute_length(start)
        if steps > max_steps:
            max_steps = steps
            result = start
    return result

And this now only takes 1.0 s and is the fastest solution so far. It might eventually fail due to the recursion depth. One can hope that the cache always has sufficiently many numbers in it such that for any sequence only around 1000 elements need to be computed. But I am not sure whether this holds in general. And the memory will scale linearly with the number of visited numbers, so eventually this will become a problem as well.