Project Euler Solution 60: Prime pair sets

Project Euler Problem 60: Prime pair sets is about finding a group of five numbers which all have a mutual property

The primes 3, 7, 109, and 673, are quite remarkable. By taking any two primes and concatenating them in any order the result will always be prime. For example, taking 7 and 109, both 7109 and 1097 are prime. The sum of these four primes, 792, represents the lowest sum for a set of four primes with this property.

Find the lowest sum for a set of five primes for which any two primes concatenate to produce another prime.

This is a problem for which I took a long time, I was struggling. It has a difficulty rating of 20 %, and I had only solved one other problem of that level before.

The actual property is pretty easy to implement. We can take the examples from the problem statement to form a test:

def test_has_property() -> None:
    primes = prime_sieve(ceiling)
    prime_set = set(primes)
    assert has_property(3, 7, prime_set)
    assert has_property(109, 3, prime_set)
    assert has_property(3, 673, prime_set)
    assert not has_property(3, 12, prime_set)

And then we can just implement this function:

def has_property(first: int, second: int, prime_set: set[int]) -> bool:
    number_1 = int(str(first) + str(second))
    number_2 = int(str(second) + str(first))
    assert number_1 <= ceiling
    assert number_2 <= ceiling
    return number_1 in prime_set and number_2 in prime_set

Now the hard part begins. How do we find a group of numbers that have this property mutually? We certainly cannot try out all five-tuples of prime numbers and check whether they have this mutual property. To see this, we can just implement this for the four numbers:

def solution_full_grid() -> int:
    primes = prime_sieve(ceiling)
    prime_set = set(primes)
    for p1 in primes:
        for p2 in primes:
            if p2 == p1:
                break
            if has_property(p1, p2, prime_set):
                for p3 in primes:
                    if p3 == p2:
                        break
                    if has_property(p1, p3, prime_set) and has_property(
                        p2, p3, prime_set
                    ):
                        for p4 in primes:
                            if p4 == p3:
                                break
                            if (
                                has_property(p1, p4, prime_set)
                                and has_property(p2, p4, prime_set)
                                and has_property(p3, p4, prime_set)
                            ):
                                return sum([p1, p2, p3, p4])

This takes 4.5 s to find the solution. I would assume that this just doesn't scale to five and that is the whole point. We need to do something more clever.

Basically this is a graph problem. The numbers that we are interested in form a graph where every node is connected to every other node. For the four numbers that we are interested in, this graph would look like this:

In a the theoretical full graph of all prime numbers we would have to find a fully connected subgraph with four elements. We can simplify the generation of this graph a little bit by having the edges run from the larger to the smaller number, like this:

When we generate the graph for all numbers up to 673, it looks like this:

The first curious property is that there are two subgraphs, and only the 3 is the common number. Everything else is disjoint. If one thinks about it, it is not really surprising. Primes either have a digit sum of 1, 2, 4, 5, 7, or 8. If the digit sum was 3, 6 or 9, the number would be divisible by 3. If we concatenate two such numbers, we need to make sure that the digit sums are both of the same family, otherwise it will be divisible by three and not be a prime. Therefore we get these two disjoint families of primes.

We can also show the graph with a hierarchical layout:

This graph teaches us another thing. We can take a look at the number 673 which is at the very top of the graph. It has eight outgoing edges, which means it has the property in common with eight numbers. But not all of these numbers share these property among each other. So the graph has many connections, but not all nodes are fully connected.

We somehow need to find the fully connected cluster of nodes when the graph is spanned up to maximum prime number. As we don't know what this number will be in advance, it would be best if we could incrementally grow the graph and check whether the newly added node is part of that fully connected cluster.

When I have added a new node to the graph, I just take all the four-tuples from the partners that it has, such that I have all possible five-tuples with where that number is the largest one. I then clumsily check whether all partners are connected among each other.

def find_tuple_of(size: int) -> int:
    partners = collections.defaultdict(list)
    primes = prime_sieve(ceiling)
    prime_set = set(primes)
    for first_prime in tqdm(primes):
        for second_prime in primes:
            if second_prime == first_prime:
                break
            if has_property(first_prime, second_prime, prime_set):
                partners[first_prime].append(second_prime)
        if len(partners[first_prime]) >= size - 1:
            for subset in itertools.combinations(partners[first_prime], size - 1):
                for i, number in enumerate(subset):
                    for other_number in subset[:i]:
                        if other_number not in partners[number]:
                            break
                    else:
                        continue
                    break
                else:
                    return first_prime + sum(subset)

Then I can use this to find the solution to the example.

def solution() -> None:
    return find_tuple_of(4)

It finds the solution to the example in 459.914 ms, that is somewhat promising. One bottleneck is the generation of the prime set. I can use the is_prime_accelerated from Solution 58: Spiral primes such that I only have to generate much fewer prime numbers. The predicate then looks like this:

def has_property(first: int, second: int) -> bool:
    number_1 = int(str(first) + str(second))
    number_2 = int(str(second) + str(first))
    return is_prime_accelerated(number_1) and is_prime_accelerated(number_2)

This accelerates the solution for a four-tuple to 28 ms.

And when I let it run for a five-tuple I get the correct solution in 9.2 s. I then looked into the other solutions that are posted online in the forums. It seems that most just cache the evaluation of the property and try all possible combinations. That seems to work sufficiently well. My graph is a more elaborate cache. There was one mathematician who used graph theory and built complete graphs with more and more nodes from the graph generated up to a given ceiling. That seems to be more akin to my approach, just that a ceiling was needed.

This was the most difficult problem so far. It took me a couple of weeks of thinking about it here and there. And then I finally got it.