Team:Groningen/Software

Software

This article will explain in detail how the software part of CryptoGErM works. To encrypt the message and translate it to DNA we wrote several Javascripts. They are used in the demonstration of our software on the Coding and Decoding pages. The following sections cover the same steps the software takes to turn a text into DNA. First encryption, then translation to DNA and finally packing it into a complete message sequence. The examples will use Hello world as the message text, and secret for the encryption key.

The software is divided into several modules, each in their own file. Two of the modules were not written by us, but used under an open source license. The AES module was written by Chris Veness at Movable-Type.co.uk. The CRC implementation was written by Github user chitchcock and published as Github Gist #5112270. All other code was written by us and is licensed under the MIT license.

Encryption & Decryption

We use the AES module for encryption and decryption, specifically: AES counter mode. This mode allows us to encrypt (and decrypt) messages of any length. It does this by splitting the message up into blocks and adding a counter to every block. The first block contains an initialization vector, which is randomized every time and also the number of blocks that follow it. As a result of this every block has a different cipher text, even if there are repetitions in the original message. In fact, the cipher text for the entire message is different, even if the same message is encrypted again.

Encrypting our example message Hello world with the key secret is now as easy as calling Aes.Ctr.encrypt(message, key) with (possibly) this result: (�5z�õøWqfÚ7÷�''. The next step is to turn this cipher text into DNA.

The AES module is in aes.js and the counter mode addition is in aes-ctr.js.

DNA

Before the text can be integrated into a bacterium, it needs to be translated to DNA. Computers store letters, digits and other characters as numbers. The numbers are translated to the symbols you see via an encoding table. The most common encodings are ASCII and Unicode. The cipher text from the example is encoded as follows: 040 002 053 122 008 245 248 087 113 102 218 055 173 152 247 011 039 039 144. We use Unicode Transform Format 8, which means that every character is encoded by 1 up to 6 bytes of 8 bits each. In binary, the bytes of the cipher text look like this: 00101000 00000010 00110101 01111010 00001000 11110101 11111000 01010111 01110001 01100110 11011010 00110111 10101101 10011000 11110111 00001011 00100111 00100111 10010000. Binary has only two digits, 0 and 1, but DNA has four 'digits': ACTG. This means those bytes that need eight bits, need only 4 DNA base-pairs. The translation to DNA is done like this:

Lets take the letter 'm' as an example.

The ASCII/Unicode code is 109.
In binary it is 01101101.
Cut off the right-most, the least significant, 2 bits: 01.
Look up the base: 00 = A, 01 = C, 10 = T, 11 = G: C.
Append the base to the output: C.
The input number is now: 011011.
Repeat steps 3 - 6 until there are no more bits left.

The result is: CGTC.

When we apply the above algorithm to the characters in the cipher text, we get: ATTA TAAA CCGA TTGC ATAA GAAG CCGT GAAG ATGT GCCC CAGC TCTC GAAG TTCT GCGA TAAG CGTT TAAG ATCT GAAG GCGT GTAA GCTA GCTA TAAG AACT.

GACCAAGCCTGCAAAACCAAATTCAAAACCCTCCAGATTATAAACCGATTGCATAAGAAGCCGTGAAGATGTGCCCCAGCTCTCGAAGTTCTGCGATAAGCGTTTAAGATAAAAACTGAAGGCGTGTAAGCTAGCTATAAGAACTGCACCCACCGAC.

Team:Groningen/Software

Software

Encryption & Decryption

DNA

Team

Project

Biology

Modelling

Computing

Human Practices

Support