Team:Vanderbilt/Software/Algorithm

MutOpt Algorithm

Summary

Our software is the first tool that gives synthetic biologists the ability to reduce the risk of mutation for any given gene sequence. Users input any gene which the program then scans for sequence motifs that are prone to mutate at elevated rates. These motifs are removed by synonymous substitutions that are designed not to effect gene function or expression. Parameters can be customized to tailor the optimization to protect against different mutagens, as well as ensuring parts are compatible with assembly standards, and allowing for mutation maximization for directed evolution experiments.


Software Application

Our software is hosted at: cosmicexplorer.github.io/mutation-optimizer

ASC

To use our software tools, user simply need to input a gene sequence into the left-hand field and press “Go!”. Both nucleotide and amino acid sequences are accepted as inputs. By default, sequences are optimized broadly against all mutation types, eliminating a variety of sites including those for oxidation, irradiation, deamination, alkylation, polymerase errors, and recombination. Any of these optimization parameters can be customized to concentrate on specific mutagens, ensure compatibility for RFC assembly standards, and directed evolution.

In the output window, the changed nucleotides are highlighted to visualize the changes made to the gene. The display also includes two metrics to compare the relative effectiveness of optimization for a particular gene. We combine the mutation types that the user includes as input (or keeps as defaults) into a single mutability score that captures how at-risk the inputted gene is to mutate. Following optimization, the software calculates the score again and returns the percent mutability change, which gives an indication of how many predicted mutation-prone sites were successfully removed by the algorithm.


Program Parameters

fig1 We analyzed the literature for reports or nucleotide sequences that appeared to mutate at rates higher than what would be expected by chance (Rogdozin and Pavlov 2003). In particular we concentrated on short (< 5 bp) motifs with multiple studies-worth of validation that could reasonably be targeted by sense substitutions. In all, we selected sequences associated with ultraviolet dimers, insertion sequences, deamination, oxidation, RNA hairpins, microhomologies, polymerase errors, alkylation, and sites non-specific to mutagens.

The program analyzes each codon of the gene and determines what synonymous substitutions can be made and calculates the number of mutagenic sites per possible construction of codons. For overall optimization, sites are assigned scores based on their relative frequency and severity. The scores are also adjusted to account for codon usage biases in the species selected. After experimentally testing multiple iterations, we developed a custom set of heuristics that achieved the best expression levels for optimized sequences.


Reference

Rogozin, I.B., and Pavlov, Y.I. (2003). Theoretical analysis of mutation hotspots and their DNA sequence context specificity. Mutat. Res. 544, 65–85