Project Euler 59 Solution: XOR decryption

Problem 59

Each character on a computer is assigned a unique code and the preferred standard is ASCII (American Standard Code for Information Interchange). For example, uppercase A = 65, asterisk (*) = 42, and lowercase k = 107.

A modern encryption method is to take a text file, convert the bytes to ASCII, then XOR each byte with a given value, taken from a secret key. The advantage with the XOR function is that using the same encryption key on the cipher text, restores the plain text; for example, 65 XOR 42 = 107, then 107 XOR 42 = 65.

For unbreakable encryption, the key is the same length as the plain text message, and the key is made up of random bytes. The user would keep the encrypted message and the encryption key in different locations, and without both "halves", it is impossible to decrypt the message.

Unfortunately, this method is impractical for most users, so the modified method is to use a password as a key. If the password is shorter than the message, which is likely, the key is repeated cyclically throughout the message. The balance for this method is using a sufficiently long password key for security, but short enough to be memorable.

Your task has been made easy, as the encryption key consists of three lower case characters. Using cipher.txt (right click and 'Save Link/Target As...'), a file containing the encrypted ASCII codes, and the knowledge that the plain text must contain common English words, decrypt the message and find the sum of the ASCII values in the original text.

Solution

Code-cracking problems are amongst the most interesting in computer science, since it combines method knowledge in a very broad field and creativity. This problem is fairly easy, since we get a lot of information already. We know the underlying text is english, we know the key has 3 letters and contains only lower case characters. And most importantly, we know the encryption method.

The easiest methods to tackle this problem are

1. Brute force all possible keys and check if the resulting message contains no invalid characters.
2. Analyze the frequency of the characters and compare it with the letter frequency of the English alphabet.
3. Challenge the encrypted message against a dictionary, look for patterns and try to reconstruct the original key. Also attacking the key with a dictionary could be an option.

The first method surprisingly solves this problem already, when we say all characters in the ranges from 32 to 90 and 97 to 122 are possible. Since our message is reasonable long, there will most certainly be the case of having a character violating the constraint if the key is wrong.

var validChar = c =>
(32 <= c && c <= 90) || (97 <= c && c <= 122);

var msg = [79,59,12,2,79,35,8,28,20, ...];

for (var i = 0; i < 26; i++)
for (var j = 0; j < 26; j++)
for (var k = 0; k < 26; k++) {

var decoded = [];
var t = [i + 97, j + 97, k + 97];

for (var p = 0; p < msg.length; p++) {
var c = t[p % 3] ^ msg[p];
if (!validChar(c)) {
break;
}
decoded.push(c);
}

if (decoded.length === msg.length) {
console.log("Key: ", t.map(String.fromCharCode).join(""),
"Sum: ", decoded.reduce((x, s) => x + s, 0));
}
}

But lets see if we can do a little better than this brute force attempt, using a frequency analysis. We know that the key has three characters. Furthermore, the most frequent character in english texts is the space, ASCII 32. The idea is that we search for the most frequent character in the class $$k$$, $$k+1$$and $$k+2$$ and assume it to be a space, from which we can conclude the key:

function guessKey(msg, keyLength) {
var freqs = [];
var key = new Uint32Array(keyLength);

for (var i = 0; i < keyLength; i++)
freqs.push(new Uint32Array(256));

for (var i = 0; i < msg.length; i++) {
var k = i % keyLength;
freqs[k][msg[i]]++;
if (freqs[k][msg[i]] > freqs[k][key[k]])
key[k] = msg[i];
}
return key.map(x => x ^ 32);
}

The result can then be found as:

var msg = [79,59,12,2,79,35,8,28,20, ...];
var key = guessKey(msg, 3);
console.log("Sum: ", msg.reduce((sum, cur, i) => sum + (key[i % 3] ^ cur), 0));

« Back to problem overview