## Random Mutation

Special attention has been given during the modelling to the
issue of random mutations. This particular phenomenon is related to
the random modification of some of the ACTG components of the
genome. Even though the reasons why this happens are unknown, the
impact that these switches have on the DNA of *Bacillus subtilis*
are very relevant in the CryptoGErM project. This is mainly related
to the computational reasons that govern the encryption process of
the data that is saved into the bacteria, in fact if even one
single DNA component mutates it will be impossible to decrypt the
message and get the original message back. The same is also true
for the key, if one bit doesn’t correspond to the original ones,
the key is not going to work on the encrypted message. In order to
be sure that the message that we have stored into the genome
wouldn’t have been affected by a random mutation we have ran some
computational simulations that reproduce this phenomenon.

The original length of *Bacillus subtilis* DNA corresponds to
4.215.619 base pairs, on the other side the length of the Sherlock
Holmes quote that we have stored in the bacteria has a total length
of 556 base pairs. The computational model simulates the random
mutation in the DNA by randomly changing 1% of the ACTG components
and checks if one of these mutations affects the inserted message.
We have ran 1000 different simulations where this process has been
computed iteratively and in none of these cases the random
mutations affected the message. From this we concluded that for our
specific case the impact of the random mutations influencing our
project was extremely low and that we could have safely saved
Arthur Conan Doyle’s quote.

We have also asked ourselves how many of these quotes we could have theoretically been able to store safely into the bacteria, in order to do so we have modified the computational model where we recursively increased the size of the message at every simulation according to the following equation:

556 corresponds to the length of the original message which is
recursively increased according to the value of N, the simulations
show that the first 2 random mutations affecting the message occur
when N reaches a value of 6. This corresponds to a 400.320 base
pair long message. This result provides interesting insights to
future research that aims to store always bigger amounts of data
into the DNA of bacteria, more in detail it corresponds to the
maximum amount of base pairs that can be modified in *Bacillus
subtilis*.

A Python implementation of the model is available at dna_analysis.py.
The script requires Python 2.7 and the `random`

,
`copy`

and `numpy`

libraries. Also make sure
to download and place BacillusSubtilis.txt
in the same directory.