Cracking a Caesar cipher

The Caesar cipher is one of the simplest encryption algorithms in which every latin letter of a given string is simply shifted cycliacally by a certain offset.

For cracking the encryption, we could iterate over all opportunities and as our alphabet uses just 26 latin letters, we would obtain the decrypted string in at most 25 tries, which is quite trivial. An example of the Caesar cipher is rot13 (rotate by 13 places) in which the alphabet is rotated by exactly the halve alphabet. A second encryption would result in the original string used by the first encryption.

I had the idea to crack the encryption on base of an analysis of the quantity of every letter and to obtain the used key this way. This is possible, because our alphabet doesn't have a uniform distribution of every single letter.

latin letter distribution

For cracking the whole thing, an algorithm must simply find the smallest distance beween the encrypted and every decrypted string. I've written a decrypter to crack any Caesar cipher and to obtain the used key by simply guessing the right answer. First we will implement an algorithm to encrypt a string using Caesar to get a perfect initial situation for our furthermore cracking attempt.. This is relatively easy to implement, we run through a given string and replace each letter by the ($char + $n) % 26. place on it:

<?php

function caesar($str, $n) {

	$ret = "";
	for($i = 0, $l = strlen($str); $i < $l; ++$i) {
		$c = ord($str[$i]);
		if (97 <= $c && $c < 123) {
			$ret.= chr(($c + $n + 7) % 26 + 97);
		} else if(65 <= $c && $c < 91) {
			$ret.= chr(($c + $n + 13) % 26 + 65);
		} else {
			$ret.= $str[$i];
		}
	}
	return $ret;
}

?>

In our cracking algorithm we run over the encrypted string and produce an array with a simple frequency statistic. Then we compare the resulting table with a table of frequencies of every single letter of our alphabet. The last step is simply finding the smallest distance between every occurrences:

<?php

function crack_caesar($str) {

	$max = 0;

	$weight = array(
		6.51, 1.89, 3.06, 5.08, 17.4, 
		1.66, 3.01, 4.76, 7.55, 0.27, 
		1.21, 3.44, 2.53, 9.78, 2.51, 
		0.29, 0.02, 7.00, 7.27, 6.15, 
		4.35, 0.67, 1.89, 0.03, 0.04, 1.13);

	$c = $s = array(
		0, 0, 0, 0, 0, 
		0, 0, 0, 0, 0, 
		0, 0, 0, 0, 0, 
		0, 0, 0, 0, 0, 
		0, 0, 0, 0, 0, 0);

	for($i = 0, $l = strlen($str); $i < $l; ++$i) {
		$x = (ord($str[$i]) | 32) - 97;
		if (0 <= $x && $x < 26) {
			++$c[$x];
		}
	}

	for ($off = 0; $off < 26; ++$off) {

		for ($i = 0; $i < 26; ++$i) {
			if ($max < ($s[$off]+= 0.01 * $c[$i] * $weight[($i + $off) % 26])) {
				$max = $s[$off];
			}
		}
	}
	return (26 - array_search($max, $s)) % 26;
}

?>

Now we use the following line to encrypt an arbitrarily string using Caesar with an offset of 16.

$str = caesar("That's the never ending story of Robert.", 16);

By using the crack_caesar() function, with $str, we'll obtain the used key of 16. You can try the script in the tools section to see the whole thing working. As you see, the algorithm is implemented in PHP and can also be downloaded as a whole here.

You might also be interested in the following

2 Comments on „Cracking a Caesar cipher”

Andrew
Andrew

@rolandc the thing I think you miss is this code can make a good guess at the key whereas yours seems to just brute force all possible keys and leaves it to the user to determine which is correct. Both effectively end up in the same place but frequency analysis would seem more elegant perhaps.

rolandc
rolandc

anyway, i just don't can't work out why you had to go so complicated in your attempt... The only thing that lacks in mine is support for whitespaces, i've coded that script in a rush, i will figure that out when i have time, but it all works great and is kept very simple

 

Sorry, comments are closed for this article. Contact me if you want to leave a note.