# Cryptopals Set 1

Posted April 21, 2017; 8 min read

This is the start of a mini-series where I walk through the Cryptopals Challenges. There are 8 sets of exercises and I’ll be tackling 1 per blog post. If you’ve already completed the challenges, this series might be useful for comparison (let me know if you have any suggestions).

## Cryptopals Sets:

Warning: There are spoilers (solutions) below!

## 1. Convert hex to base64

The very first exercise is to convert a hexadecimal string to base64. This is a trivial task using Python.

## 2. Fixed XOR

The second exercise is to “write a function that takes two equal-length buffers and produces their XOR combination”.

The exercise provides hexadecimal strings as input so we must convert them to the raw byte representation in order to XOR them. I converted them to bytearrays but there may be a better Python type. Then, we just XOR each byte using Python’s in build XOR operator `^`.

## 3. Single-byte XOR cipher

This is when the Cryptopals Challenge starts to get interesting! In this exercise, the plaintext has been encrypted with one character (known as a Caesar cipher). The goal is to find this character (the key), given a ciphertext provided in hexadecimal.

There are not many ASCII characters (only 256 to be exact) so I just tried all combinations. For each key, I decrypted the ciphertext to get a plaintext and I scored that plaintext based on the likelihood it was in English. I did this by looking at the character frequency of the plaintext and comparing it to the character frequency of the English language. This obviously only works against English plaintexts.

The code below is a simplified version of the original version.

## 4. Detect single-character XOR

This question is a rehash of the previous question. Instead of finding the most likely plaintext possible given one ciphertext, we are given 60 ciphertexts.

The code below is a simplified version of the original version.

## 5. Implement repeating-key XOR

In this exercise, we are asked to encrypt a piece of text with a repeating-key (a Vigenère cipher). This was fairly simple to achieve using code from previous questions.

The code below is a simplified version of the original version.

## 6. Break repeating-key XOR

When I said “this is when the Cryptopals Challenge starts to get interesting!” in exercise 3, I was wrong. It actually becomes interesting now! Let’s break a Vigenère cipher!

### Hamming distance

The first subtask is to write a function to determine the Hamming distance (distance is just the number of differing bits) between two encoded strings. A nice property (or is it the definition?) of XOR, is that the result of XOR’ing two bits is 1 if and only if they are different. We can therefore determine the Hamming distance between two strings by XOR’ing them (using the function we previously created) and counting up the number of 1 bits.

### Determining likely key sizes

To break a Vigenère cipher, we need to determine the key and to do that we must first determine the key size. We can use the normalized Hamming distance between ciphertext blocks (groups of bytes of key size length) to guess the key size. The key size with the smallest normalized Hamming distance is probably the actual key size used. Using this heuristic, we discover the most likely key size is 2 bytes (followed by 3 bytes and then 29 bytes) for the ciphertext provided.

This works because the bits of the plaintext are generally not uniformly random (e.g. if the plaintext is some English text) and the Hamming distance between two English characters is typically smaller than that between two random bytes (e.g. ~2-3 vs ~4).

Let’s take two ciphertext blocks `c1`, `c2`. If we have chosen the correct key size, then `c1 = p1 ⊕ k` and `c2 = p1 ⊕ k` where `p1`, `p2` are they plaintext blocks’ respective plaintext blocks and `k` is the key. The normalized Hamming distance between the two blocks tends to the normalized Hamming distance between `p1` and `p2` (~2-3). If the key size is incorrect, then `c1 = p1 ⊕ k1` and `c2 = p1 ⊕ k2` where `p1`, `p2` are they plaintext blocks’ as before and `k1` and `k2` are separate different keys. The normalized Hamming distance between the two blocks tends to the normalized Hamming distance between two random bytes (~4).

Still confused? Check out this stackoverflow answer.

Note: Don’t forget to decode the ciphertext from base64! I spent way too long trying to fix that bug.

### Putting it all together

Now that we have a list of potential key sizes ordered by likelihood, we can go through each and see if it is the correct one.

We can turn this Vigenère cipher problem into `N` Caesar cipher problems where `N = KEYSIZE`. To find each byte of the key, we find the bytes of the ciphertext that were encrypted by that key byte and compute the key byte using the function we used to break the Caesar cipher in exercise 3.

Once we have the entire key (it’s “Terminator X: Bring the noise” by the way), we can decrypt the ciphertext using the same method we used to performed encryption in exercise 5 (XOR is it’s own inverse).

The code below is a simplified version of the original version.

## 7. AES in ECB mode

The last two exercises in this set involve working with a real encryption standard, AES (although ECB mode is hopefully not used as it’s insecure)! I’ll discuss ECB mode further in the next exercise, where we have to break AES in ECB mode, but for this one, let’s just decrypt a given ciphertext.

By the way, we are told the ciphertext is encrypted using AES-128 which means the blocks are 128 bits long.

Note: I used the pycrypto module but had a few `import` issues on my MacBook when installed using pip. If you have similar issues, have a look at this stackoverflow answer.

## 8. Detect AES in ECB mode

We’ve made it to the last exercise (in the first set)! The rest of the exercises built up to this one. Let’s get started.

### Electronic Codebook (ECB) mode

ECB mode is one of simplest block cipher mode of operation. The plaintext is divided into blocks of a fixed size and each block is encrypted separately. Decryption is performed in a similar fashion.

The disadvantage of this mode is that identical plaintext blocks are encrypted to identical ciphertext blocks. This gives attackers information about patterns in the plaintext.

Ciphertexts that have many repeating identical blocks are therefore very likely to have been encrypted using ECB mode. We can use this information to solve this exercise. The task is to find which hexadecimal string in a file is most likely to be a ciphertext encrypted using AES in ECB mode.

Note: Although we didn’t break the ciphertext (by discovering the key or plaintext), we at least know the mode it was encrypted in.

## Fin

That’s it! I hope you enjoyed walking through the first set of Cryptopals Challenges. Want to keep going? Read through the second set of exercises.