Difference between revisions of "Team:Groningen/Software"

Line 8: Line 8:
 
CryptoGErM works. To encrypt the message and translate it to DNA we  
 
CryptoGErM works. To encrypt the message and translate it to DNA we  
 
wrote several Javascripts. They are used in the demonstration of our
 
wrote several Javascripts. They are used in the demonstration of our
software on the <a href="/Team:Groningen/Coding">Coding</a> page.
+
software on the <a href="/Team:Groningen/Coding">Coding</a> and
 +
<a href="/Team:Groningen/Decoding">Decoding</a> pages.
 
The following sections cover the same steps the software takes to
 
The following sections cover the same steps the software takes to
 
turn a text into DNA. First encryption, then translation to DNA
 
turn a text into DNA. First encryption, then translation to DNA
Line 16: Line 17:
 
 
 
<p>The software is divided into several modules, each in their own
 
<p>The software is divided into several modules, each in their own
file. Every module creates a variable that contains the functionality
+
file. Two of the modules were not written by us, but
that it provides. Two of the modules were not written by us, but
+
 
used under an open source license. The AES module was written by
 
used under an open source license. The AES module was written by
 
Chris Veness at <a href="http://www.movable-type.co.uk/scripts/aes.html">Movable-Type.co.uk</a>.
 
Chris Veness at <a href="http://www.movable-type.co.uk/scripts/aes.html">Movable-Type.co.uk</a>.
Line 29: Line 29:
 
<h3>Encryption &amp; Decryption</h3>
 
<h3>Encryption &amp; Decryption</h3>
 
 
<p>For the encryption we don't use the <code>AES</code> module  
+
<p>We use the AES module for encryption and decryption, specifically:
directly, but the <code>AES.Ctr</code> module. <code>AES.Ctr</code>
+
AES counter mode. This mode allows us to encrypt (and decrypt) messages
implements the AES counter mode of operation. This allows us to  
+
of any length. It does this by splitting the message up into blocks
encrypt messages of arbitrary length, rather than a fixed length of
+
and adding a counter to every block. The first block contains an
256 bits, which is around 32 letters.</p>
+
initialization vector, which is randomized every time and also the
+
number of blocks that follow it. As a result of this every block
<p>AES counter mode also adds randomness to the message. So the
+
has a different cipher text, even if there are repetitions in the
encrypted text is always different, even when the same message is
+
original message. In fact, the cipher text for the entire message
encrypted multiple times. When the message is decrypted these random
+
is different, even if the same message is encrypted again.</p>
bits are removed, so the output message after decryption is still
+
the same as the original.</p>
+
 
 
 
<p>Encrypting our example message <code>Hello world</code> with the
 
<p>Encrypting our example message <code>Hello world</code> with the
key <code>secret</code> could result in this encrypted
+
key <code>secret</code> is now as easy as calling
text: <code id="enc">=�¼iõœÚW�èu,�*†õ“C�</code>. The
+
<code>Aes.Ctr.encrypt(message, key)</code> with (possibly) this result:  
next step is to turn these symbols into DNA.</p>
+
<code id="enc">(�5z�õøWqfÚ7­˜÷�''</code>. The
 +
next step is to turn this cipher text into DNA.</p>
 
 
 
<p>The AES module is in <a href="/Template:Groningen/aes_js?action=raw&ctype=text/javascript">aes.js</a>
 
<p>The AES module is in <a href="/Template:Groningen/aes_js?action=raw&ctype=text/javascript">aes.js</a>
Line 53: Line 52:
 
<h3>DNA</h3>
 
<h3>DNA</h3>
 
 
<p>Before we can translate the encrypted text to DNA, we need to
+
<p>Before the text can be integrated into a bacterium, it needs to
turn them into numbers first. Computers do this via a scheme called
+
be translated to DNA. Computers store letters, digits and other
ASCII [[ASCII]], and more modernly using Unicode [[UNICODE]]. In Unicode every letter,
+
characters as numbers. The numbers are translated to the symbols
digit, punctuation mark and other symbol is represented by a single
+
you see via an encoding table. The most common encodings are ASCII  
number called a code point.</p>
+
and Unicode. The cipher text from the example is encoded as
 +
follows: <code id="utf">040 002 053 122 008 245 248 087 113 102 218
 +
055 173 152 247 011 039 039 144</code>. We use Unicode Transform
 +
Format 8, which means that every character is encoded by 1 up to 6
 +
bytes of 8 bits each. In binary, the bytes of the cipher text look
 +
like this: <code id="bin">00101000 00000010 00110101 01111010
 +
00001000 11110101 11111000 01010111 01110001 01100110 11011010
 +
00110111 10101101 10011000 11110111 00001011 00100111 00100111
 +
10010000</code>. Binary has only two digits, 0 and 1, but DNA has
 +
four 'digits': ACTG. This means those bytes that need eight bits,
 +
need only 4 DNA base-pairs. The translation to DNA is done like this:</p>
 
 
<p>Translation to DNA is done by two modules: <code>DNA</code> and
+
<p>Lets take the letter 'm' as an example.</p>
<code>EightBit</code>. The <code>EightBit</code> module translates
+
letters, digits and punctuation into eight-bit numbers: from 0 to
+
255 and <code>DNA</code> translates those numbers into DNA sequences.</p>
+
 
 
 +
<ol>
 +
<li>The ASCII/Unicode code is <code>109</code>.</li>
 +
<li>In binary it is <code>01101101</code>.</li>
 +
<li>Cut off the right-most, the least significant, 2 bits: <code>01</code>.</li>
 +
<li>Look up the base: 00 = A, 01 = C, 10 = T, 11 = G: <code>C</code>.</li>
 +
<li>Append the base to the output: <code>C</code>.</li>
 +
<li>The input number is now: <code>011011</code>.</li>
 +
<li>Repeat steps 3 - 6 until there are no more bits left.</li>
 +
</ol>
 
 
 +
<p>The result is: <code>CGTC</code>.</p>
 +
 +
<p>When we apply the above algorithm to the characters in the
 +
cipher text, we get: <code id="dna">ATTA TAAA CCGA TTGC ATAA GAAG
 +
CCGT GAAG ATGT GCCC CAGC TCTC GAAG TTCT GCGA TAAG CGTT TAAG ATCT
 +
GAAG GCGT GTAA GCTA GCTA TAAG AACT</code>.</p>
 +
 +
 +
 +
<p><code
 +
id="msg">GACCAAGCCTGCAAAACCAAATTCAAAACCCTCCAGATTATAAACCGATTGCATAAGAAGCCGTGAAGATGTGCCCCAGCTCTCGAAGTTCTGCGATAAGCGTTTAAGATAAAAACTGAAGGCGTGTAAGCTAGCTATAAGAACTGCACCCACCGAC</code>.</p>
 
</section>
 
</section>
 
</article>
 
</article>
Line 84: Line 110:
 
dna = EightBit.encodeStr(enc).match(/[ATCG]{4}/g).join(' '),
 
dna = EightBit.encodeStr(enc).match(/[ATCG]{4}/g).join(' '),
 
msg = Message.packDNA(EightBit.encodeStr(enc));
 
msg = Message.packDNA(EightBit.encodeStr(enc));
 
console.log(txt, enc, utf, bin, dna, msg);
 
 
 
 
$('#enc').html(enc.utf8Decode());
 
$('#enc').html(enc.utf8Decode());
 +
$('#utf').html(utf);
 +
$('#bin').html(bin);
 +
$('#dna').html(dna);
 +
$('#msg').html(msg);
 
});
 
});
 
</script>
 
</script>
 
</html>
 
</html>
 
{{Groningen/footer}}
 
{{Groningen/footer}}

Revision as of 14:05, 8 October 2016

CryptoGE®M
Team
Project
Biology
Computing
Human Practice
Acknowledgements

Software

This article will explain in detail how the software part of CryptoGErM works. To encrypt the message and translate it to DNA we wrote several Javascripts. They are used in the demonstration of our software on the Coding and Decoding pages. The following sections cover the same steps the software takes to turn a text into DNA. First encryption, then translation to DNA and finally packing it into a complete message sequence. The examples will use Hello world as the message text, and secret for the encryption key.

The software is divided into several modules, each in their own file. Two of the modules were not written by us, but used under an open source license. The AES module was written by Chris Veness at Movable-Type.co.uk. The CRC implementation was written by Github user chitchcock and published as Github Gist #5112270. All other code was written by us and is licensed under the MIT license.

Encryption & Decryption

We use the AES module for encryption and decryption, specifically: AES counter mode. This mode allows us to encrypt (and decrypt) messages of any length. It does this by splitting the message up into blocks and adding a counter to every block. The first block contains an initialization vector, which is randomized every time and also the number of blocks that follow it. As a result of this every block has a different cipher text, even if there are repetitions in the original message. In fact, the cipher text for the entire message is different, even if the same message is encrypted again.

Encrypting our example message Hello world with the key secret is now as easy as calling Aes.Ctr.encrypt(message, key) with (possibly) this result: (�5z�õøWqfÚ7­˜÷�''. The next step is to turn this cipher text into DNA.

The AES module is in aes.js and the counter mode addition is in aes-ctr.js.

DNA

Before the text can be integrated into a bacterium, it needs to be translated to DNA. Computers store letters, digits and other characters as numbers. The numbers are translated to the symbols you see via an encoding table. The most common encodings are ASCII and Unicode. The cipher text from the example is encoded as follows: 040 002 053 122 008 245 248 087 113 102 218 055 173 152 247 011 039 039 144. We use Unicode Transform Format 8, which means that every character is encoded by 1 up to 6 bytes of 8 bits each. In binary, the bytes of the cipher text look like this: 00101000 00000010 00110101 01111010 00001000 11110101 11111000 01010111 01110001 01100110 11011010 00110111 10101101 10011000 11110111 00001011 00100111 00100111 10010000. Binary has only two digits, 0 and 1, but DNA has four 'digits': ACTG. This means those bytes that need eight bits, need only 4 DNA base-pairs. The translation to DNA is done like this:

Lets take the letter 'm' as an example.

  1. The ASCII/Unicode code is 109.
  2. In binary it is 01101101.
  3. Cut off the right-most, the least significant, 2 bits: 01.
  4. Look up the base: 00 = A, 01 = C, 10 = T, 11 = G: C.
  5. Append the base to the output: C.
  6. The input number is now: 011011.
  7. Repeat steps 3 - 6 until there are no more bits left.

The result is: CGTC.

When we apply the above algorithm to the characters in the cipher text, we get: ATTA TAAA CCGA TTGC ATAA GAAG CCGT GAAG ATGT GCCC CAGC TCTC GAAG TTCT GCGA TAAG CGTT TAAG ATCT GAAG GCGT GTAA GCTA GCTA TAAG AACT.

GACCAAGCCTGCAAAACCAAATTCAAAACCCTCCAGATTATAAACCGATTGCATAAGAAGCCGTGAAGATGTGCCCCAGCTCTCGAAGTTCTGCGATAAGCGTTTAAGATAAAAACTGAAGGCGTGTAAGCTAGCTATAAGAACTGCACCCACCGAC.

Oop top