Team:Stanford-Brown/SB16 Software


Stanford-Brown 2016

Protein Optimization & Gibson Assembly Primer design

Overview

The purpose of this script is to expedite the process of protein DNA sequence optimization and Gibson Primer design for Gibson assembly reactions. The files for the script can be found either here on github or be downloaded individually from the IGEM database below.

Note because IGEM does not support .py file extensions, you will need to remove .txt from the extensions of all of the downloaded files from the links below and append ".py" to the end of (1) seq_analyzer, (2) seq_tools, (3) input_tools, and (4) format_tools (i.e. "seq_tools.txt" becomes "seq_tools.py.") No further formatting is needed for the github link download.

Input File Formatting

In order to run the program, the user has to provide a text file containing the sequences in need of optimization and/or primer design. Each sequence should be included on a separate line; any blank newline entries and spaces will result in a processing error. The sequences should also contain no other characters other than that representing nucleotides (A,T,G,C) or amino acids (G,A,L,M,F,W,K,Q,E,S,P,V,I,C,Y,H,R,N,D,T,*).

For protein optimization, sequences in the input file are not limited to only amino acid or nucleotide sequences--the user can input either and the program will recognize the sequence type and process it accordingly. This is limited however in cases where the only amino acids in the sequence are alanine, threonine, cysteine, and glycine, since the single letter code for each amino acid is also found in the single letter nucleotide representations. For these cases, the program will be unable to distinguish between an amino acid or nucloetide sequence--this however can be corrected by putting a "*" at the end of an amino acid sequence.
  1. First sequence contains at least 50nt of the 3' end of the backbone where the 5' end of the first fragment will join to
  2. N number of sequences to be assembled ...
  3. Last sequence contains the 5' end of the backbone where the the 3' end of the last fragment will join to This is illustrated in the following figure, where the first sequence in the file is "Backbone front", last sequence is "Backbone rear", and the middle sequences in the file would be the N fragments listed in order of assembly.


Sample input and output files are also included in the repository and are listed below:

Input:

Output:

Once the user input files have been formatted properly, the script can be run.

Running the Program

Before the script can be run, make sure you have downloaded the following files to a directory of your choosing:
  • seq_analyzer.py
  • seq_tools.py
  • input_tools.py
  • format_tools.py
  • Codon_Lib
  • NT_Lib
  • restriction_enzymes

These files contain the functions needed by the main script to run the algorithms, and also contain libraries for the program to read from that contain codon and nucleotide pairing maps, and restriction site sequences.

Additionally, the user should download

which are codon frequency use tables for their respective organisms. If a different codon table is desired, the user can create a textfile with each row (codon) arranged as such:
3nt_codon, \t, single_letter_aminoacid_abbreviation, \t, frequency, #/1000

Example:

GCT A 0.16 15.34

GCC A 0.27 25.51

GCA A 0.21 20.28

GCG A 0.36 33.66


To run the script, you will need to open terminal or command prompt on your computer. For windows, press Windows+R, and then type “cmd” into the run bar, and hit enter. On mac, hit Command+Space, type “terminal” and then hit enter. This should open a command line interface in which the program can be run.
Commands are input into the program in the following format: