Cryptopals Set 1
Posted April 21, 2017; 8 min read
This is the start of a mini-series where I walk through the Cryptopals Challenges. There are 8 sets of exercises and I’ll be tackling 1 per blog post. If you’ve already completed the challenges, this series might be useful for comparison (let me know if you have any suggestions).
Cryptopals Sets:
Warning: There are spoilers (solutions) below!
1. Convert hex to base64
The very first exercise is to convert a hexadecimal string to base64. This is a trivial task using Python.
2. Fixed XOR
The second exercise is to “write a function that takes two equal-length buffers and produces their XOR combination”.
The exercise provides hexadecimal strings as input so we must convert them to the raw byte representation in order to XOR them. I converted them to bytearrays but there may be a better Python type. Then, we just XOR each byte using Python’s in build XOR operator ^
.
3. Single-byte XOR cipher
This is when the Cryptopals Challenge starts to get interesting! In this exercise, the plaintext has been encrypted with one character (known as a Caesar cipher). The goal is to find this character (the key), given a ciphertext provided in hexadecimal.
There are not many ASCII characters (only 256 to be exact) so I just tried all combinations. For each key, I decrypted the ciphertext to get a plaintext and I scored that plaintext based on the likelihood it was in English. I did this by looking at the character frequency of the plaintext and comparing it to the character frequency of the English language. This obviously only works against English plaintexts.
The code below is a simplified version of the original version.
4. Detect single-character XOR
This question is a rehash of the previous question. Instead of finding the most likely plaintext possible given one ciphertext, we are given 60 ciphertexts.
The code below is a simplified version of the original version.
5. Implement repeating-key XOR
In this exercise, we are asked to encrypt a piece of text with a repeating-key (a Vigenère cipher). This was fairly simple to achieve using code from previous questions.
The code below is a simplified version of the original version.
6. Break repeating-key XOR
When I said “this is when the Cryptopals Challenge starts to get interesting!” in exercise 3, I was wrong. It actually becomes interesting now! Let’s break a Vigenère cipher!
Hamming distance
The first subtask is to write a function to determine the Hamming distance (distance is just the number of differing bits) between two encoded strings. A nice property (or is it the definition?) of XOR, is that the result of XOR’ing two bits is 1 if and only if they are different. We can therefore determine the Hamming distance between two strings by XOR’ing them (using the function we previously created) and counting up the number of 1 bits.
Determining likely key sizes
To break a Vigenère cipher, we need to determine the key and to do that we must first determine the key size. We can use the normalized Hamming distance between ciphertext blocks (groups of bytes of key size length) to guess the key size. The key size with the smallest normalized Hamming distance is probably the actual key size used. Using this heuristic, we discover the most likely key size is 2 bytes (followed by 3 bytes and then 29 bytes) for the ciphertext provided.
This works because the bits of the plaintext are generally not uniformly random (e.g. if the plaintext is some English text) and the Hamming distance between two English characters is typically smaller than that between two random bytes (e.g. ~2-3 vs ~4).
Let’s take two ciphertext blocks c1
, c2
. If we have chosen the correct key size, then c1 = p1 ⊕ k
and c2 = p1 ⊕ k
where p1
, p2
are they plaintext blocks’ respective plaintext blocks and k
is the key. The normalized Hamming distance between the two blocks tends to the normalized Hamming distance between p1
and p2
(~2-3). If the key size is incorrect, then c1 = p1 ⊕ k1
and c2 = p1 ⊕ k2
where p1
, p2
are they plaintext blocks’ as before and k1
and k2
are separate different keys. The normalized Hamming distance between the two blocks tends to the normalized Hamming distance between two random bytes (~4).
Still confused? Check out this stackoverflow answer.
Note: Don’t forget to decode the ciphertext from base64! I spent way too long trying to fix that bug.
Putting it all together
Now that we have a list of potential key sizes ordered by likelihood, we can go through each and see if it is the correct one.
We can turn this Vigenère cipher problem into N
Caesar cipher problems where N = KEYSIZE
. To find each byte of the key, we find the bytes of the ciphertext that were encrypted by that key byte and compute the key byte using the function we used to break the Caesar cipher in exercise 3.
Once we have the entire key (it’s “Terminator X: Bring the noise” by the way), we can decrypt the ciphertext using the same method we used to performed encryption in exercise 5 (XOR is it’s own inverse).
The code below is a simplified version of the original version.
7. AES in ECB mode
The last two exercises in this set involve working with a real encryption standard, AES (although ECB mode is hopefully not used as it’s insecure)! I’ll discuss ECB mode further in the next exercise, where we have to break AES in ECB mode, but for this one, let’s just decrypt a given ciphertext.
By the way, we are told the ciphertext is encrypted using AES-128 which means the blocks are 128 bits long.
Note: I used the pycrypto module but had a few import
issues on my MacBook when installed using pip. If you have similar issues, have a look at this stackoverflow answer.
8. Detect AES in ECB mode
We’ve made it to the last exercise (in the first set)! The rest of the exercises built up to this one. Let’s get started.
Electronic Codebook (ECB) mode
ECB mode is one of simplest block cipher mode of operation. The plaintext is divided into blocks of a fixed size and each block is encrypted separately. Decryption is performed in a similar fashion.
The disadvantage of this mode is that identical plaintext blocks are encrypted to identical ciphertext blocks. This gives attackers information about patterns in the plaintext.
Ciphertexts that have many repeating identical blocks are therefore very likely to have been encrypted using ECB mode. We can use this information to solve this exercise. The task is to find which hexadecimal string in a file is most likely to be a ciphertext encrypted using AES in ECB mode.
Note: Although we didn’t break the ciphertext (by discovering the key or plaintext), we at least know the mode it was encrypted in.
Fin
That’s it! I hope you enjoyed walking through the first set of Cryptopals Challenges. Want to keep going? Read through the second set of exercises.