Project Euler Solution 59: XOR decryption

In Problem 59: XOR decryption we have to break a rather simple encryption scheme.

Each character on a computer is assigned a unique code and the preferred standard is ASCII (American Standard Code for Information Interchange). For example, uppercase A = 65, asterisk (*) = 42, and lowercase k = 107.

A modern encryption method is to take a text file, convert the bytes to ASCII, then XOR each byte with a given value, taken from a secret key. The advantage with the XOR function is that using the same encryption key on the cipher text, restores the plain text; for example, 65 XOR 42 = 107, then 107 XOR 42 = 65.

For unbreakable encryption, the key is the same length as the plain text message, and the key is made up of random bytes. The user would keep the encrypted message and the encryption key in different locations, and without both "halves", it is impossible to decrypt the message.

Unfortunately, this method is impractical for most users, so the modified method is to use a password as a key. If the password is shorter than the message, which is likely, the key is repeated cyclically throughout the message. The balance for this method is using a sufficiently long password key for security, but short enough to be memorable.

Your task has been made easy, as the encryption key consists of three lower case characters. Using p059_cipher.txt, a file containing the encrypted ASCII codes, and the knowledge that the plain text must contain common English words, decrypt the message and find the sum of the ASCII values in the original text.

I found this one a bit disappointing in the sense that you had to guess around. The problem lies in the statement “plain text must contain common English words”. So what does that mean? Which characters are allowed exactly? At first I started with letters, space and punctuation. But that didn't work out at all. I tried all printable characters. But that were just too many options.

Let us look at the code first. From the problem statement we know that the XOR key has three letters and that it is repeated. This means that we can partition the text into three parts and independently find a key for each one. Assuming the possible_keys functions returns a list of possible keys to use on that set of bytes such that it generates “common English words”, we can write the solution like this:

def solution() -> int:
    ciphertext = read_file()
    keys_1 = possible_keys(ciphertext[::3])
    keys_2 = possible_keys(ciphertext[1::3])
    keys_3 = possible_keys(ciphertext[2::3])
    assert len(keys_1) == 1
    assert len(keys_2) == 1
    assert len(keys_3) == 1
    return sum(
        c ^ k
        for c, k in zip(ciphertext, itertools.cycle([keys_1[0], keys_2[0], keys_3[0]]))
    )

If there is only one possible key, we can compute the solution number.

Reading the file is simple:

def read_file() -> list[int]:
    with open("data/p059_cipher.txt") as f:
        return json.loads(f"[{f.read()}]")

Determining the possible keys is the interesting part. We try to decrypt the ciphertext and see whether the text contains any characters which is not in our set of acceptable characters.

def possible_keys(ciphertext: list[int]) -> set[int]:
    result = []
    for char in string.ascii_lowercase:
        plaintext = {ct ^ ord(char) for ct in set(ciphertext)}
        extras = plaintext - acceptable
        if not extras:
            result.append(ord(char))
    return result

But what are those acceptable characters? I fiddled around a lot and eventually looked up the solution text because I felt pretty stuck. Whatever I chose was either too strict or too lenient. It turned out that I need to include numbers, punctuation, plus sign, parenthesis, brackets, single and double quotes:

acceptable = set(
    ord(x) for x in string.ascii_letters + string.digits + " .,:;+?!/[]()'\""
)

And with this the code runs in 938 µs to produce the solution.

It was somewhat nice, but also disappointing because there wasn't a clear path to the solution, really.