Difference between revisions of "Team:Groningen/Software"

Revision as of 14:05, 8 October 2016

Software

This article will explain in detail how the software part of CryptoGErM works. To encrypt the message and translate it to DNA we wrote several Javascripts. They are used in the demonstration of our software on the Coding and Decoding pages. The following sections cover the same steps the software takes to turn a text into DNA. First encryption, then translation to DNA and finally packing it into a complete message sequence. The examples will use Hello world as the message text, and secret for the encryption key.

The software is divided into several modules, each in their own file. Two of the modules were not written by us, but used under an open source license. The AES module was written by Chris Veness at Movable-Type.co.uk. The CRC implementation was written by Github user chitchcock and published as Github Gist #5112270. All other code was written by us and is licensed under the MIT license.

Encryption & Decryption

We use the AES module for encryption and decryption, specifically: AES counter mode. This mode allows us to encrypt (and decrypt) messages of any length. It does this by splitting the message up into blocks and adding a counter to every block. The first block contains an initialization vector, which is randomized every time and also the number of blocks that follow it. As a result of this every block has a different cipher text, even if there are repetitions in the original message. In fact, the cipher text for the entire message is different, even if the same message is encrypted again.

Encrypting our example message Hello world with the key secret is now as easy as calling Aes.Ctr.encrypt(message, key) with (possibly) this result: (�5z�õøWqfÚ7÷�''. The next step is to turn this cipher text into DNA.

The AES module is in aes.js and the counter mode addition is in aes-ctr.js.

DNA

Before the text can be integrated into a bacterium, it needs to be translated to DNA. Computers store letters, digits and other characters as numbers. The numbers are translated to the symbols you see via an encoding table. The most common encodings are ASCII and Unicode. The cipher text from the example is encoded as follows: 040 002 053 122 008 245 248 087 113 102 218 055 173 152 247 011 039 039 144. We use Unicode Transform Format 8, which means that every character is encoded by 1 up to 6 bytes of 8 bits each. In binary, the bytes of the cipher text look like this: 00101000 00000010 00110101 01111010 00001000 11110101 11111000 01010111 01110001 01100110 11011010 00110111 10101101 10011000 11110111 00001011 00100111 00100111 10010000. Binary has only two digits, 0 and 1, but DNA has four 'digits': ACTG. This means those bytes that need eight bits, need only 4 DNA base-pairs. The translation to DNA is done like this:

Lets take the letter 'm' as an example.

The ASCII/Unicode code is 109.
In binary it is 01101101.
Cut off the right-most, the least significant, 2 bits: 01.
Look up the base: 00 = A, 01 = C, 10 = T, 11 = G: C.
Append the base to the output: C.
The input number is now: 011011.
Repeat steps 3 - 6 until there are no more bits left.

The result is: CGTC.

When we apply the above algorithm to the characters in the cipher text, we get: ATTA TAAA CCGA TTGC ATAA GAAG CCGT GAAG ATGT GCCC CAGC TCTC GAAG TTCT GCGA TAAG CGTT TAAG ATCT GAAG GCGT GTAA GCTA GCTA TAAG AACT.

GACCAAGCCTGCAAAACCAAATTCAAAACCCTCCAGATTATAAACCGATTGCATAAGAAGCCGTGAAGATGTGCCCCAGCTCTCGAAGTTCTGCGATAAGCGTTTAAGATAAAAACTGAAGGCGTGTAAGCTAGCTATAAGAACTGCACCCACCGAC.

@@ Line 8: / Line 8: @@
 			CryptoGErM works. To encrypt the message and translate it to DNA we
 			wrote several Javascripts. They are used in the demonstration of our
-			software on the	<a href="/Team:Groningen/Coding">Coding</a> page.
+			software on the	<a href="/Team:Groningen/Coding">Coding</a> and
+			<a href="/Team:Groningen/Decoding">Decoding</a> pages.
 			The following sections cover the same steps the software takes to
 			turn a text into DNA. First encryption, then translation to DNA
@@ Line 16: / Line 17: @@
 			<p>The software is divided into several modules, each in their own
-			file. Every module creates a variable that contains the functionality
+			file. Two of the modules were not written by us, but
-			that it provides. Two of the modules were not written by us, but
 			used under an open source license. The AES module was written by
 			Chris Veness at <a href="http://www.movable-type.co.uk/scripts/aes.html">Movable-Type.co.uk</a>.
@@ Line 29: / Line 29: @@
 			<h3>Encryption &amp; Decryption</h3>
-			<p>For the encryption we don't use the <code>AES</code> module
+			<p>We use the AES module for encryption and decryption, specifically:
-			directly, but the <code>AES.Ctr</code> module. <code>AES.Ctr</code>
+			AES counter mode. This mode allows us to encrypt (and decrypt) messages
-			implements the AES counter mode of operation. This allows us to
+			of any length. It does this by splitting the message up into blocks
-			encrypt messages of arbitrary length, rather than a fixed length of
+			and adding a counter to every block. The first block contains an
-bits, which is around 32 letters.</p>
+			initialization vector, which is randomized every time and also the
+			number of blocks that follow it. As a result of this every block
-			<p>AES counter mode also adds randomness to the message. So the
+			has a different cipher text, even if there are repetitions in the
-			encrypted text is always different, even when the same message is
+			original message. In fact, the cipher text for the entire message
-			encrypted multiple times. When the message is decrypted these random
+			is different, even if the same message is encrypted again.</p>
-			bits are removed, so the output message after decryption is still
-			the same as the	original.</p>
 			<p>Encrypting our example message <code>Hello world</code> with the
-			key <code>secret</code> could result in this encrypted
+			key <code>secret</code> is now as easy as calling
-			text: <code id="enc">=�¼iõÚW�èu,�*õC�</code>. The
+			<code>Aes.Ctr.encrypt(message, key)</code> with (possibly) this result:
-			next step is to turn these symbols into DNA.</p>
+			<code id="enc">(�5z�õøWqfÚ7÷�''</code>. The
+			next step is to turn this cipher text into DNA.</p>
 			<p>The AES module is in <a href="/Template:Groningen/aes_js?action=raw&ctype=text/javascript">aes.js</a>
@@ Line 53: / Line 52: @@
 			<h3>DNA</h3>
-			<p>Before we can translate the encrypted text to DNA, we need to
+			<p>Before the text can be integrated into a bacterium, it needs to
-			turn them into numbers first. Computers do this via a scheme called
+			be translated to DNA. Computers store letters, digits and other
-			ASCII [[ASCII]], and more modernly using Unicode [[UNICODE]]. In Unicode every letter,
+			characters as numbers. The numbers are translated to the symbols
-			digit, punctuation mark and other symbol is represented by a single
+			you see via an encoding table. The most common encodings are ASCII
-			number called a code point.</p>
+			and Unicode. The cipher text from the example is encoded as
+			follows: <code id="utf">040 002 053 122 008 245 248 087 113 102 218
+173 152 247 011 039 039 144</code>. We use Unicode Transform
+			Format 8, which means that every character is encoded by 1 up to 6
+			bytes of 8 bits each. In binary, the bytes of the cipher text look
+			like this: <code id="bin">00101000 00000010 00110101 01111010
+			00001000 11110101 11111000 01010111 01110001 01100110 11011010
+			00110111 10101101 10011000 11110111 00001011 00100111 00100111
+			10010000</code>. Binary has only two digits, 0 and 1, but DNA has
+			four 'digits': ACTG. This means those bytes that need eight bits,
+			need only 4 DNA base-pairs. The translation to DNA is done like this:</p>
-			<p>Translation to DNA is done by two modules: <code>DNA</code> and
+			<p>Lets take the letter 'm' as an example.</p>
-			<code>EightBit</code>. The <code>EightBit</code> module translates
-			letters, digits and punctuation into eight-bit numbers: from 0 to
-and <code>DNA</code> translates those numbers into DNA sequences.</p>
+			<ol>
+				<li>The ASCII/Unicode code is <code>109</code>.</li>
+				<li>In binary it is <code>01101101</code>.</li>
+				<li>Cut off the right-most, the least significant, 2 bits: <code>01</code>.</li>
+				<li>Look up the base: 00 = A, 01 = C, 10 = T, 11 = G: <code>C</code>.</li>
+				<li>Append the base to the output: <code>C</code>.</li>
+				<li>The input number is now: <code>011011</code>.</li>
+				<li>Repeat steps 3 - 6 until there are no more bits left.</li>
+			</ol>
+			<p>The result is: <code>CGTC</code>.</p>
+			<p>When we apply the above algorithm to the characters in the
+			cipher text, we get: <code id="dna">ATTA TAAA CCGA TTGC ATAA GAAG
+			CCGT GAAG ATGT GCCC CAGC TCTC GAAG TTCT GCGA TAAG CGTT TAAG ATCT
+			GAAG GCGT GTAA GCTA GCTA TAAG AACT</code>.</p>
+			<p><code
+			id="msg">GACCAAGCCTGCAAAACCAAATTCAAAACCCTCCAGATTATAAACCGATTGCATAAGAAGCCGTGAAGATGTGCCCCAGCTCTCGAAGTTCTGCGATAAGCGTTTAAGATAAAAACTGAAGGCGTGTAAGCTAGCTATAAGAACTGCACCCACCGAC</code>.</p>
 		</section>
 	</article>
@@ Line 84: / Line 110: @@
 				dna = EightBit.encodeStr(enc).match(/[ATCG]{4}/g).join(' '),
 				msg = Message.packDNA(EightBit.encodeStr(enc));
-			console.log(txt, enc, utf, bin, dna, msg);
 			$('#enc').html(enc.utf8Decode());
+			$('#utf').html(utf);
+			$('#bin').html(bin);
+			$('#dna').html(dna);
+			$('#msg').html(msg);
 		});
 	</script>
 </html>
 {{Groningen/footer}}

Difference between revisions of "Team:Groningen/Software"

Revision as of 14:05, 8 October 2016

Software

Encryption & Decryption

DNA

Team

Project

Biology

Modelling

Computing

Human Practices

Support