Special attention has been given during the modelling to the issue of random mutations. This particular phenomenon is related to the random modification of some of the ACTG components of the genome. Even though the reasons why this happens are unknown, the impact that these switches have on the DNA of Bacillus subtilis are very relevant in the CryptoGErM project. This is mainly related to the computational reasons that govern the encryption process of the data that is saved into the bacteria, in fact if even one single DNA component mutates it will be impossible to decrypt the message and get the original message back. The same is also true for the key, if one bit doesn’t correspond to the original ones, the key is not going to work on the encrypted message. In order to be sure that the message that we have stored into the genome wouldn’t have been affected by a random mutation we have ran some computational simulations that reproduce this phenomenon.
The original length of Bacillus subtilis DNA corresponds to 4.215.619 base pairs, on the other side the length of the Sherlock Holmes quote that we have stored in the bacteria has a total length of 556 base pairs. The computational model simulates the random mutation in the DNA by randomly changing 1% of the ACTG components and checks if one of these mutations affects the inserted message. We have ran 1000 different simulations where this process has been computed iteratively and in none of these cases the random mutations affected the message. From this we concluded that for our specific case the impact of the random mutations influencing our project was extremely low and that we could have safely saved Arthur Conan Doyle’s quote.
We have also asked ourselves how many of these quotes we could have theoretically been able to store safely into the bacteria, in order to do so we have modified the computational model where we recursively increased the size of the message at every simulation according to the following equation:
556 corresponds to the length of the original message which is recursively increased according to the value of N, the simulations show that the first 2 random mutations affecting the message occur when N reaches a value of 6. This corresponds to a 400.320 base pair long message. This result provides interesting insights to future research that aims to store always bigger amounts of data into the DNA of bacteria, more in detail it corresponds to the maximum amount of base pairs that can be modified in Bacillus subtilis.
A Python implementation of the model is available at dna_analysis.py.
The script requires Python 2.7 and the
numpy libraries. Also make sure
to download and place BacillusSubtilis.txt
in the same directory.