Difference between revisions of "Team:Exeter/Collaborations"

Line 685: Line 685:
 
 
 
<!--Give span class "oneline" or "twoline" depending on how llong the section text is-->
 
<!--Give span class "oneline" or "twoline" depending on how llong the section text is-->
<a href="#section_1" class="banner_link col-xs-6 col-sm-3"><span class="oneline">Measurement</span></a>
+
<a href="#section_1" class="banner_link col-xs-6 col-sm-3"><span class="oneline">Newcastle</span></a>
<a href="#section_2" class="banner_link col-xs-6 col-sm-3"><span class="oneline">Software</span></a>
+
<a href="#section_2" class="banner_link col-xs-6 col-sm-3"><span class="oneline">Perdue</span></a>
<a href="#section_3" class="banner_link col-xs-6 col-sm-3"><span class="oneline">Parts</span></a>
+
<a href="#section_3" class="banner_link col-xs-6 col-sm-3"><span class="oneline">Glasgow</span></a>
<a href="#section_4" class="banner_link col-xs-6 col-sm-3"><span class="twoline">Skype and<br /> Meet-ups</span></a>
+
<a href="#section_4" class="banner_link col-xs-6 col-sm-3"><span class="oneline">Edinburgh</span></a>
 
</div>
 
</div>
 
<!--Left picture (the teal line on left)-->
 
<!--Left picture (the teal line on left)-->
Line 735: Line 735:
 
<div id="section_2" class="link_fix"></div>
 
<div id="section_2" class="link_fix"></div>
 
<div id="contentTitle">
 
<div id="contentTitle">
Software </div>
+
Software: Newcastle </div>
 
<div>
 
<div>
<h3>Purdue Collaboration</h3>
 
 
 
<p id="pp">Our team helped Purdue with this by logging data for the 260  
 
<p id="pp">Our team helped Purdue with this by logging data for the 260  
 
iGEM teams of 2015 and critiquing ease of use and effectiveness of the database. For each team  
 
iGEM teams of 2015 and critiquing ease of use and effectiveness of the database. For each team  
Line 748: Line 746:
 
easy this database was to use to help them improve on what they had done so far.</p>  
 
easy this database was to use to help them improve on what they had done so far.</p>  
 
<br />
 
<br />
<h3>Edinburgh Collaboration</h3>
+
<h6>Optimising methods of data mutation detection in BabbleBlocks</h6>
+
<p id="pp">
+
Storing information on DNA offers many advantages over current methods, however mutations
+
need to be carefully monitored to ensure incorrect data is not read as a false positive.
+
Currently for information stored on a BabbleBrick a ‘CheckSum’ is calculated by taking the
+
sum of the values on each base of DNA. If the checksum of a BabbleBlock has changed between
+
the time of writing and reading, the data is considered to be corrupt.
+
</p>
+
<p id="pp">
+
<span class="equation">$C = \sum^{bp}_{n=1} bp_n$</span><br />
+
<span class="equation_key">
+
$C$: Frequency of checksum<br />
+
$n$: The integer address of base pair<br />
+
$bp$: Amount of base pairs (5 times the number of BabbleBricks)<br />
+
$bp_n$: The value of the $n^{th}$ base pair
+
</span>
+
</p>
+
<div class="col-xs-12" style="width:100%;position:relative;margin:auto;padding:0;">
+
<div class="graph_box col-xs-12">
+
<img src="https://static.igem.org/mediawiki/2016/4/48/T--Exeter--Collaboration_Edinb_1.png">
+
<span>Fig. 1. The frequency of all checksums in a babbleBlock system containing two BabbleBricks.</span>
+
</div>
+
<div class="graph_box col-xs-12">
+
<img src="https://static.igem.org/mediawiki/2016/0/0b/T--Exeter--Collaboration_Edinb_2.png">
+
<span>Fig. 2. The frequency of all checksums in a babbleBlock system containing three BabbleBricks.</span>
+
</div>
+
</div>
+
<p id="pp">
+
Currently a checksum utilizes only a small percentage of the values that can be stored.
+
A BabbleBrick contains 5 base 4 digits meaning that 4$^{\text{5}B}$ unique bits of
+
information share one of 15$B$ checksums where $B$ is the amount of BabbleBricks in one
+
BabbleBlock. This data has been plotted for BabbleBlocks containing 2 and 3 BabbleBricks
+
in Fig.1 and Fig.2 respectively. Assuming that between the time of writing and reading
+
any number of mutations can occur, the maximum probability of a mutation event resulting
+
in the same checksum can be calculated by comparing the frequency of one checksum to the
+
total frequency of unique bits of information.
+
</p>
+
<p id="pp">
+
<span class="equation">$P_C = \big(\frac{C_{max}}{F}) \approx \big(\frac{1.2 \times 10^5}{4^{10}}) = 11$% in a 2 BabbleBrick system</span><br />
+
<span class="equation">$\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\approx \big(\frac{10^8}{4^{15}}) = 9$% in a 3 BabbleBrick system</span>
+
<span class="equation_key">
+
$P_C$: Maximum probability of the same checksum occuring after any number of mutations<br />
+
$C_{max}$: Frequency of most common checksum<br />
+
$F$: Frequency of possible unique bits of information
+
</span>
+
</p>
+
<p id="pp">
+
Therefore, it can be predicted that for an average sentence containing 9 words the maximum
+
probability of the same checksum occurring will be of the magnitude of 1%. The probability
+
should decrease marginally when adding BabbleBricks due to the slightly increased range of
+
checksums that become available. This value can be optimized by altering the method of the
+
checksum to utilize a greater range of values and to spread out the frequency more evenly as
+
to reduce the maximum probability of the same checksum occurring.
+
</p>
+
<p id="pp">
+
Currently one BabbleBlock has 4 BabbleBricks dedicated to storing the checksum, giving a maximum
+
10$^4$ possible values. The first step in determining a ‘CheckMethod’ is to ensure that all checksums
+
for a suitable amount of BabbleBricks can be stored without going over 10$^4$. It is also important
+
to not use operators that will result in negative numbers or decimals, therefore limiting the
+
possible checksum values to integers up to but not including 10$^4$, this rules out operators such
+
as subtract and divide. For this example, a suitable number of words in a sentence and therefore
+
BabbleBricks in a BabbleBlock shall be 20. All simulations will be carried out on 3 BabbleBrick
+
systems due to computing limitations.
+
</p>
+
<p id="pp">
+
Checksums are non-directional, for example a BabbleBrick of bases [2,2,2,2,2] would have the
+
same checksum as [2,1,3,2,2].  To alter this a checkmethod will incorporate the position
+
of the base in to the calculation. At each point the digit is multiplied by its position
+
in the BabbleBlock, where the first BabbleBrick has digit positions 1 to 5 and the last
+
BabbleBrick (20$^{th}$) has positions 96 to 100. A scaler $\alpha$ has been included to
+
increase the range of results. To ensure multiplications don’t result in a null result
+
the value of each base had a value of 1 added to it. The first checkemethod of one
+
BabbleBlock can be defined as:
+
</p>
+
<p id="pp">
+
<span class="equation">$M_1 = \sum_{n=1}^{bp}(bp_n + 1) . \alpha . bp$</span>
+
<span class="equation_key">
+
$M_1$: Frequency of CheckMethod 1<br />
+
$\alpha$: Scaler ($\alpha = 5$ in this example)
+
</span>
+
</p>
+
<div class="col-xs-12" style="width:100%;position:relative;margin:auto;padding:0;">
+
<div class="graph_box col-xs-12">
+
<img src="https://static.igem.org/mediawiki/2016/4/4c/T--Exeter--Collaboration_Edinb_3.png">
+
<span>Fig. 3. The frequency of checkmethod 1 for all possible bits of information in a babbleBlock system containing two BabbleBricks.</span>
+
</div>
+
<div class="graph_box col-xs-12">
+
<img src="https://static.igem.org/mediawiki/2016/d/d7/T--Exeter--Collaboration_Edinb_4.png">
+
<span>Fig. 4. The frequency of checkmethod 1 for all possible bits of information in a babbleBlock system containing three BabbleBricks.</span>
+
</div>
+
</div>
+
<p id="pp">
+
This method results in Fig.3 and Fig.4 for a 2 and 3 BabbleBlock system respectively,
+
which shows a large improvement over the original checksum method. The maximum frequency
+
of a single checksum has been significantly decreased whichwill lower the probability of
+
a flase positive occuring; this is largely due to the large range of results available to
+
the method. However, there is still room for improvement as the shaded area of the graph
+
indicates that on a smaller scale the frequency of checkmethod 1 varies between high and low
+
values. Eliminating this fluctuation would allow for the data to be spread out more evenly.
+
To improve this
+
method a second layer of multiplication will be implamented, each digit will
+
now be multiplied by a constant depending on its relative position in the BabbleBrick.
+
</p>
+
<p id="pp">
+
<span class="equation">$M_2 = \sum_{p=1}^B \sum_{q=1}^{5}(bp_{(5B_p + q)} + 1) . q . bp$</span><br />
+
<span class="equation" style="font-size:60%;">Or using the remainder modulo '%'</span><br />
+
<span class="equation">$M_2 = \sum_{n=1}^{bp} (bp_n + 1) . ((bp \text{ % } 5) + 1) . bp$</span>
+
<span class="equation_key">
+
$M_2$: Frequency of CheckMethod 2<br />
+
$B$: Number of BabbleBricks in the BabbleBlock<br />
+
$p$: Local integer address of BabbleBrick<br />
+
$q$: Local integer address of base pair in BabbleBrick<br />
+
$B_p$: The $p^{th}$ Babblebrick in the BabbleBlock
+
</span>
+
</p>
+
<div class="col-xs-12" style="width:100%;position:relative;margin:auto;padding:0;">
+
<div class="graph_box col-xs-12">
+
<img src="https://static.igem.org/mediawiki/2016/6/6f/T--Exeter--Collaboration_Edinb_5.png">
+
<span>Fig. 5. The frequency of checkmethod 2 for all possible bits of information in a babbleBlock system containing two BabbleBricks.</span>
+
</div>
+
<div class="graph_box col-xs-12">
+
<img src="https://static.igem.org/mediawiki/2016/0/06/T--Exeter--Collaboration_Edinb_6.png">
+
<span>Fig. 6. The frequency of checkmethod 2 for all possible bits of information in a babbleBlock system containing three BabbleBricks.</span>
+
</div>
+
</div>
+
<p id="pp">
+
<span class="equation">$P_{M_2} = \big(\frac{M_{2\:max}}{F}) \approx \big(\frac{6 \times 10^3}{4^{10}}) = 0.6$% in a 2 BabbleBrick system ($11$% for checksum)</span><br />
+
<span class="equation">$\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\approx \big(\frac{3 \times 10^6}{4^{15}}) = 0.3$% in a 3 BabbleBrick system ($9$% for checksum)</span>
+
<span class="equation_key">
+
$P_{M_2}$: Maximum probability of the same checkmethod 2 value occuring after any number of mutations<br />
+
$M_{2\:max}$: Frequency of most common checkmethod 2 value<br />
+
$F$: Frequency of possible unique bits of information
+
</span>
+
</p>
+
<p id="pp">
+
This has been plotted for a 2 and 3 BabbleBlock system in Fig.5 and Fig.6 respectively.
+
When comparing checksum to checkcethod 2 the frequency peak is approximately 20 to 30
+
times smaller in both cases whilst utilizing more values. In Fig.5 and Fig.6 the largest
+
improvement using the second iteration of the checkmethod is the utilization of every
+
integer value, checkmethod 1 appears shaded as the frequency varies frequently. The last
+
step is to test checkmethod 2 when used in a babbleBlock containing 20 BabbleBricks; the
+
largest value possible assuming a BabbleBlock containing the value ‘3’ in each digit will
+
grant a value of 60600 which falls out of the current limit of 10$^4$ values. Therefore,
+
it is recommended that one more BabbleBrick is added to the end of the BabbleBlock in order
+
to store 10$^5$ values. 
+
</p>
+
<p id="pp">
+
To improve this method  further more complex multiplications could be added, it would be
+
a decision based on optimising efficiency of calculations and minimising false positives.
+
In a 2 and 3 BabbleBrick system the probability of a false positives occurring was reduced by
+
approximately 20 and 30 times respectively, although the numbers are too large to compute,
+
this new method has the possibility of lowering the maximum false positive error of the previously
+
used checksum by one or more orders of magnitude.
+
If continued further, research should also be done in to the reconstruction of data after it has been lost.
+
</p>
+
 
<div>
 
<div>
 
<a id="Section_link" href="#section_3" style="display:block;margin:20px auto 0 auto;width:14px;"><span style="color:#47BCC2;font-size: 25px;" class="glyphicon glyphicon-menu-down" aria-hidden="true"></span></a>
 
<a id="Section_link" href="#section_3" style="display:block;margin:20px auto 0 auto;width:14px;"><span style="color:#47BCC2;font-size: 25px;" class="glyphicon glyphicon-menu-down" aria-hidden="true"></span></a>
Line 1,091: Line 934:
 
<div id="section_4" class="link_fix"></div>
 
<div id="section_4" class="link_fix"></div>
 
<div id="contentTitle">
 
<div id="contentTitle">
Skype and Meet-ups
+
Theory: Edinburgh
 
</div>
 
</div>
 +
<h6>Optimising methods of data mutation detection in BabbleBlocks</h6>
 +
<p id="pp">
 +
Storing information on DNA offers many advantages over current methods, however mutations
 +
need to be carefully monitored to ensure incorrect data is not read as a false positive.
 +
Currently for information stored on a BabbleBrick a ‘CheckSum’ is calculated by taking the
 +
sum of the values on each base of DNA. If the checksum of a BabbleBlock has changed between
 +
the time of writing and reading, the data is considered to be corrupt.
 +
</p>
 +
<p id="pp">
 +
<span class="equation">$C = \sum^{bp}_{n=1} bp_n$</span><br />
 +
<span class="equation_key">
 +
$C$: Frequency of checksum<br />
 +
$n$: The integer address of base pair<br />
 +
$bp$: Amount of base pairs (5 times the number of BabbleBricks)<br />
 +
$bp_n$: The value of the $n^{th}$ base pair
 +
</span>
 +
</p>
 +
<div class="col-xs-12" style="width:100%;position:relative;margin:auto;padding:0;">
 +
<div class="graph_box col-xs-12">
 +
<img src="https://static.igem.org/mediawiki/2016/4/48/T--Exeter--Collaboration_Edinb_1.png">
 +
<span>Fig. 1. The frequency of all checksums in a babbleBlock system containing two BabbleBricks.</span>
 +
</div>
 +
<div class="graph_box col-xs-12">
 +
<img src="https://static.igem.org/mediawiki/2016/0/0b/T--Exeter--Collaboration_Edinb_2.png">
 +
<span>Fig. 2. The frequency of all checksums in a babbleBlock system containing three BabbleBricks.</span>
 +
</div>
 +
</div>
 +
<p id="pp">
 +
Currently a checksum utilizes only a small percentage of the values that can be stored.
 +
A BabbleBrick contains 5 base 4 digits meaning that 4$^{\text{5}B}$ unique bits of
 +
information share one of 15$B$ checksums where $B$ is the amount of BabbleBricks in one
 +
BabbleBlock. This data has been plotted for BabbleBlocks containing 2 and 3 BabbleBricks
 +
in Fig.1 and Fig.2 respectively. Assuming that between the time of writing and reading
 +
any number of mutations can occur, the maximum probability of a mutation event resulting
 +
in the same checksum can be calculated by comparing the frequency of one checksum to the
 +
total frequency of unique bits of information.
 +
</p>
 +
<p id="pp">
 +
<span class="equation">$P_C = \big(\frac{C_{max}}{F}) \approx \big(\frac{1.2 \times 10^5}{4^{10}}) = 11$% in a 2 BabbleBrick system</span><br />
 +
<span class="equation">$\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\approx \big(\frac{10^8}{4^{15}}) = 9$% in a 3 BabbleBrick system</span>
 +
<span class="equation_key">
 +
$P_C$: Maximum probability of the same checksum occuring after any number of mutations<br />
 +
$C_{max}$: Frequency of most common checksum<br />
 +
$F$: Frequency of possible unique bits of information
 +
</span>
 +
</p>
 +
<p id="pp">
 +
Therefore, it can be predicted that for an average sentence containing 9 words the maximum
 +
probability of the same checksum occurring will be of the magnitude of 1%. The probability
 +
should decrease marginally when adding BabbleBricks due to the slightly increased range of
 +
checksums that become available. This value can be optimized by altering the method of the
 +
checksum to utilize a greater range of values and to spread out the frequency more evenly as
 +
to reduce the maximum probability of the same checksum occurring.
 +
</p>
 +
<p id="pp">
 +
Currently one BabbleBlock has 4 BabbleBricks dedicated to storing the checksum, giving a maximum
 +
10$^4$ possible values. The first step in determining a ‘CheckMethod’ is to ensure that all checksums
 +
for a suitable amount of BabbleBricks can be stored without going over 10$^4$. It is also important
 +
to not use operators that will result in negative numbers or decimals, therefore limiting the
 +
possible checksum values to integers up to but not including 10$^4$, this rules out operators such
 +
as subtract and divide. For this example, a suitable number of words in a sentence and therefore
 +
BabbleBricks in a BabbleBlock shall be 20. All simulations will be carried out on 3 BabbleBrick
 +
systems due to computing limitations.
 +
</p>
 +
<p id="pp">
 +
Checksums are non-directional, for example a BabbleBrick of bases [2,2,2,2,2] would have the
 +
same checksum as [2,1,3,2,2].  To alter this a checkmethod will incorporate the position
 +
of the base in to the calculation. At each point the digit is multiplied by its position
 +
in the BabbleBlock, where the first BabbleBrick has digit positions 1 to 5 and the last
 +
BabbleBrick (20$^{th}$) has positions 96 to 100. A scaler $\alpha$ has been included to
 +
increase the range of results. To ensure multiplications don’t result in a null result
 +
the value of each base had a value of 1 added to it. The first checkemethod of one
 +
BabbleBlock can be defined as:
 +
</p>
 +
<p id="pp">
 +
<span class="equation">$M_1 = \sum_{n=1}^{bp}(bp_n + 1) . \alpha . bp$</span>
 +
<span class="equation_key">
 +
$M_1$: Frequency of CheckMethod 1<br />
 +
$\alpha$: Scaler ($\alpha = 5$ in this example)
 +
</span>
 +
</p>
 +
<div class="col-xs-12" style="width:100%;position:relative;margin:auto;padding:0;">
 +
<div class="graph_box col-xs-12">
 +
<img src="https://static.igem.org/mediawiki/2016/4/4c/T--Exeter--Collaboration_Edinb_3.png">
 +
<span>Fig. 3. The frequency of checkmethod 1 for all possible bits of information in a babbleBlock system containing two BabbleBricks.</span>
 +
</div>
 +
<div class="graph_box col-xs-12">
 +
<img src="https://static.igem.org/mediawiki/2016/d/d7/T--Exeter--Collaboration_Edinb_4.png">
 +
<span>Fig. 4. The frequency of checkmethod 1 for all possible bits of information in a babbleBlock system containing three BabbleBricks.</span>
 +
</div>
 +
</div>
 +
<p id="pp">
 +
This method results in Fig.3 and Fig.4 for a 2 and 3 BabbleBlock system respectively,
 +
which shows a large improvement over the original checksum method. The maximum frequency
 +
of a single checksum has been significantly decreased whichwill lower the probability of
 +
a flase positive occuring; this is largely due to the large range of results available to
 +
the method. However, there is still room for improvement as the shaded area of the graph
 +
indicates that on a smaller scale the frequency of checkmethod 1 varies between high and low
 +
values. Eliminating this fluctuation would allow for the data to be spread out more evenly.
 +
To improve this
 +
method a second layer of multiplication will be implamented, each digit will
 +
now be multiplied by a constant depending on its relative position in the BabbleBrick.
 +
</p>
 +
<p id="pp">
 +
<span class="equation">$M_2 = \sum_{p=1}^B \sum_{q=1}^{5}(bp_{(5B_p + q)} + 1) . q . bp$</span><br />
 +
<span class="equation" style="font-size:60%;">Or using the remainder modulo '%'</span><br />
 +
<span class="equation">$M_2 = \sum_{n=1}^{bp} (bp_n + 1) . ((bp \text{ % } 5) + 1) . bp$</span>
 +
<span class="equation_key">
 +
$M_2$: Frequency of CheckMethod 2<br />
 +
$B$: Number of BabbleBricks in the BabbleBlock<br />
 +
$p$: Local integer address of BabbleBrick<br />
 +
$q$: Local integer address of base pair in BabbleBrick<br />
 +
$B_p$: The $p^{th}$ Babblebrick in the BabbleBlock
 +
</span>
 +
</p>
 +
<div class="col-xs-12" style="width:100%;position:relative;margin:auto;padding:0;">
 +
<div class="graph_box col-xs-12">
 +
<img src="https://static.igem.org/mediawiki/2016/6/6f/T--Exeter--Collaboration_Edinb_5.png">
 +
<span>Fig. 5. The frequency of checkmethod 2 for all possible bits of information in a babbleBlock system containing two BabbleBricks.</span>
 +
</div>
 +
<div class="graph_box col-xs-12">
 +
<img src="https://static.igem.org/mediawiki/2016/0/06/T--Exeter--Collaboration_Edinb_6.png">
 +
<span>Fig. 6. The frequency of checkmethod 2 for all possible bits of information in a babbleBlock system containing three BabbleBricks.</span>
 +
</div>
 +
</div>
 +
<p id="pp">
 +
<span class="equation">$P_{M_2} = \big(\frac{M_{2\:max}}{F}) \approx \big(\frac{6 \times 10^3}{4^{10}}) = 0.6$% in a 2 BabbleBrick system ($11$% for checksum)</span><br />
 +
<span class="equation">$\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\approx \big(\frac{3 \times 10^6}{4^{15}}) = 0.3$% in a 3 BabbleBrick system ($9$% for checksum)</span>
 +
<span class="equation_key">
 +
$P_{M_2}$: Maximum probability of the same checkmethod 2 value occuring after any number of mutations<br />
 +
$M_{2\:max}$: Frequency of most common checkmethod 2 value<br />
 +
$F$: Frequency of possible unique bits of information
 +
</span>
 +
</p>
 +
<p id="pp">
 +
This has been plotted for a 2 and 3 BabbleBlock system in Fig.5 and Fig.6 respectively.
 +
When comparing checksum to checkcethod 2 the frequency peak is approximately 20 to 30
 +
times smaller in both cases whilst utilizing more values. In Fig.5 and Fig.6 the largest
 +
improvement using the second iteration of the checkmethod is the utilization of every
 +
integer value, checkmethod 1 appears shaded as the frequency varies frequently. The last
 +
step is to test checkmethod 2 when used in a babbleBlock containing 20 BabbleBricks; the
 +
largest value possible assuming a BabbleBlock containing the value ‘3’ in each digit will
 +
grant a value of 60600 which falls out of the current limit of 10$^4$ values. Therefore,
 +
it is recommended that one more BabbleBrick is added to the end of the BabbleBlock in order
 +
to store 10$^5$ values. 
 +
</p>
 +
<p id="pp">
 +
To improve this method  further more complex multiplications could be added, it would be
 +
a decision based on optimising efficiency of calculations and minimising false positives.
 +
In a 2 and 3 BabbleBrick system the probability of a false positives occurring was reduced by
 +
approximately 20 and 30 times respectively, although the numbers are too large to compute,
 +
this new method has the possibility of lowering the maximum false positive error of the previously
 +
used checksum by one or more orders of magnitude.
 +
If continued further, research should also be done in to the reconstruction of data after it has been lost.
 +
</p>
 
</div>
 
</div>
 
</div>
 
</div>

Revision as of 20:35, 17 October 2016