iBurn on iPad

iBurn is an offline map and guide for the Burning Man art festival. For the 2014 release we rewrote it from scratch for iOS 7 on top of some really awesome open source software. We decided to use YapDatabase+Mantle instead of Core Data, and Mapbox instead of MapKit for our offline map tiles. For users of Android devices, David Brodsky developed a version of iBurn for Android.

iBurn App Store Link iBurn Google Play Store Link

Just don’t get playa dust in your electronics!

We just launched Kickflip, our open-source live video broadcasting SDK for iOS and Android. Check it out!

XMPP is at serious risk of being permanently replaced by proprietary alternatives that have already solved the mobile push messaging problem. Remember in the 90s how many closed IM networks there were? Today it is much worse.

Read the rest at chatsecure.org.

I’ve been taking Coursera’s Stanford Cryptography I class and last week’s homework had an interesting extra credit problem: given eleven ciphertexts all encrypted with the same one-time pad key, find the plaintext of the last ciphertext. It included a hint to XOR the ciphertexts together and to take a note of what happens when you XOR a space with lowercase and uppercase characters.

When I initially went to solve the problem, I thought that it would make the most sense to use frequency analysis but hit a dead-end after a few hours and felt pretty frustrated. However, after studying the hint more carefully, I realized:

'ABC' XOR '   ' = 'abc'
'abc' XOR '   ' = 'ABC'

Eureka! If you combine that with these properties of XOR:

'plain text' XOR 'secretkeys' = 0x0309021b0b541f000107
'safe texts' XOR 'secretkeys' = 0x0004051745000e1d0d00

'plain text' XOR 0x0309021b0b541f000107 = 'secretkeys'
'safe texts' XOR 0x0004051745000e1d0d00 = 'secretkeys'

0x030d070c4e54111d0c07 = 'plain text' XOR 'safe texts'
0x030d070c4e54111d0c07 = 0x0309021b0b541f000107 XOR 0x0004051745000e1d0d00

Pay close attention to the last two lines. You’ll notice that the XOR of two ciphertexts encoded with the same OTP key is identical to the XOR of the two plaintexts. Now, let’s look closer at the contents of 0x030d070c4e54111d0c07:

['\x03', '\r', '\x07', '\x0c', 'N', 'T', '\x11', '\x1d', '\x0c', '\x07']

You’ll notice the uppercase ‘N’ and ‘T’ characters definitely stand out. From what we learned earlier, this means that the 5th position of the two original plaintexts has a 50% chance of being either a space or a lowercase ‘n’, and the 6th position is either a ‘t’ or a space.

** Spoiler Alert: If you’re currently taking this class and haven’t completed this homework, please do not read any further until you have solved it on your own. Your solution will likely be better than mine. **

I used this knowledge, along with the XOR permutation of the 11 given ciphertexts with each other to assemble the probability of each character at a given index. Basically, I had a parallel ‘probability’ array that kept track of the chance of each character occurring at each index.

def scan_string(string, probability_arrays, ct1_index, ct2_index):
    characters = list(string)
    
    for character_index in range(len(string)):
        character = characters[character_index]
        letter = scan_for_letter(character)
        if letter:
            set_letter_at_index(letter, character_index, probability_arrays, ct1_index, ct2_index)
            
def permute_array(cipher_texts, probability_arrays):
    for x in range(len(cipher_texts)):
        for y in range(x+1, len(cipher_texts)):
            xor = xor_indices(cipher_texts, x, y)
            scan_string(xor, probability_arrays, x, y)

The scan_for_letter function checks if a given character is within [a-zA-Z] and returns the swapped case of that letter if found. The set_letter_at_index function helps us keep track of the occurance of each letter by recording both a space and the letter in two probability arrays, each array corresponding to the one of the two currently XORed ciphertext pairs.

It ends up looking something like this:

1. [{'T': 3, ' ': 3}, {'h': 3, ' ': 3}, {'e': 3, ' ': 3}, {' ': 2, 'e': 2, 'r': 2}, {}, ...]
...

I figured that if for a given index, a non-space character with a 0.5 probability was probably that character. For a short-cut, basically all 2-element letter probability dictionaries could be assumed to be whatever non-space character was present. Otherwise, if there is no clear letter match, it is probably a space character. Sometimes there are no letter matches at all so I output a ‘*’ instead, with the final output looking something like this:

1. The *...
def most_probable(prob_dict):
    if len(prob_dict) is 0:
        return ' '
    total_character_count = 0
    keys = prob_dict.keys()
    for key in keys:
        character_count = prob_dict[key]
        total_character_count = total_character_count + character_count
    for key in keys:
        prob_dict[key] = float(prob_dict[key]) / total_character_count
    highest_probability_index = 0
    for i in range(1, len(keys)):
        key = keys[i]
        probability = prob_dict[key]
        highest_probability = prob_dict[keys[highest_probability_index]]
        if probability > highest_probability:
            highest_probability_index = i
    most_probable_key = keys[highest_probability_index]
    return most_probable_key

The most_probable function probably isn’t the most efficient or beautiful way of calculating this, but it is very useful and I end up reusing it later when refining the key guesses. You could have stopped right here and had a reasonable guess for each original plaintext, missing some characters here and there.

However, I thought that I could improve the accuracy of my plaintexts by guessing the key that the most generated plaintexts had in common with each other. I re-used the most_probable function to find the most common key by keeping track of the key bytes at each character index and finding the most commonly shared key. This allowed me to find the non-letter characters that were left, like punctuation and even the sneaky single-character ‘’ present in one of the plaintexts.

def guess_key(guess_texts, cipher_texts):
    key_guess_probabilities = generate_key_guess_array(length_of_longest_array(guess_texts))
 
    for i in range(len(guess_texts)):
        guess_text = guess_texts[i]
        key_guess = strxor(cipher_texts[i].decode("hex"), guess_text)
        update_key_probabilities(key_guess, key_guess_probabilities)
 
    key_guess = guess_key_from_probabilities(key_guess_probabilities)
    return key_guess

Unfortunately my solution still has bugs (can you find them?) and I wasn’t able to completely automate the key-finding process. There were still a bunch of garbage characters that I needed to guess based on context clues. I ended up repeatedly running the guess_key function on manually refined plaintexts until I found the original OTP key that decrypts all of the ciphertexts in their entirety.

I could have automated the process a bit more by using an English dictionary and guessing the missing characters within words using the Levenshtein distance to known words, but I don’t know how well that would work. I’m still interested in a completely automated solution that doesn’t require any manual intervention but, for now, this will have to do.

Remember, never use the same OTP key more than once or your ciphertexts will leak information about the original plaintexts and eventually reveal your key.

If you’re interested in the rest of the source code, it’s available as redacted Gist on GitHub.

I just finished redesigning chatsecure.org (source code) with Jekyll and the beautiful Incorporated theme. Here’s a link to the old version of the website for reference.

One of the main motivations for the redesign was to include information about the recently re-branded Android version of ChatSecure (previously it was called Gibberbot). A large number of support requests for Android started showing up in the iOS GitHub Issues and UserVoice pages, so hopefully this new design should help guide people in the right direction.

Overall I have been very pleased with the zen of writing blog posts and other content in Markdown, and keeping it version controlled with git. Storing text in a database (especially without version control) just seems so broken now.