Difference between revisions of "Team:Stanford-Brown/SB16 Software"

Line 228: Line 228:
 
</ul>
 
</ul>
 
<br>which are codon frequency use tables for their respective organisms.  If a different codon table is desired, the user can create a textfile with each row (codon) arranged as such:<br>
 
<br>which are codon frequency use tables for their respective organisms.  If a different codon table is desired, the user can create a textfile with each row (codon) arranged as such:<br>
<<p style="margin-left: 40px">3nt_codon, \t, single_letter_aminoacid_abbreviation, \t, frequency, #/1000</p><br>
+
<p style="margin-left: 50px">3nt_codon, \t, single_letter_aminoacid_abbreviation, \t, frequency, #/1000</p><br>
 
<br>Example:<br>
 
<br>Example:<br>
<p style="margin-left: 40px">
+
<p style="margin-left: 50px">
 
GCT A  0.16    15.34
 
GCT A  0.16    15.34
 
GCC A  0.27    25.51
 
GCC A  0.27    25.51
Line 238: Line 238:
 
<br>To run the script, you will need to open terminal or command prompt on your computer.  For windows, press Windows+R, and then type “cmd” into the run bar, and hit enter.  On mac, hit Command+Space, type “terminal” and then hit enter.  This should open a command line interface in which the program can be run.
 
<br>To run the script, you will need to open terminal or command prompt on your computer.  For windows, press Windows+R, and then type “cmd” into the run bar, and hit enter.  On mac, hit Command+Space, type “terminal” and then hit enter.  This should open a command line interface in which the program can be run.
 
<br>Commands are input into the program in the following format:
 
<br>Commands are input into the program in the following format:
 
+
<pre><code class="language-python">python seq_analyzer.py -<span class="hljs-keyword">if</span> sequence_file -m (g)ibson/(p)rotoptimization -c organism_codon_file -o output_file -t speed
 +
</code></pre>
 +
<br>in which the arguments are:<br>
 +
<ul>
 +
<li>-if = input filename, note the input file should be in the same directory</li>
 +
<li>-m = mode, p for protein optimization, and g for gibson primer design</li>
 +
<li>-c = organism codon frequency usage</li>
 +
<li>-o = output filename</li>
 +
<li>-t = speed at which you want to run the program–a larger number is 0.</li>
 +
</ul>
 +
<br>The required arguments are -if, -m, and -c.  If no output filename is designated, the output file will be named after the input file with “.output” concatenated on the end.  if no speed is designated, the default speed is 1 second between each step.
 +
<br>Example:<br>
 +
<pre><code class="language-python">python seq_analyzer.py -<span class="hljs-keyword">if</span> sequence_file -m g -c ecoli -o output -t <span class="hljs-number">10</span>
 +
python seq_analyzer.py -<span class="hljs-keyword">if</span> sequence_file -m g -c ecoli
 +
</code></pre>
 +
<br>Both commands above do the same thing, except the first command designates an output file called output and makes the operation run at 10 seconds between each step.
 +
<br>The script will then run and depending on the operation selected ask for appropriate user input and at the end output a text file with the results.
  
  

Revision as of 03:42, 19 October 2016


Stanford-Brown 2016

Protein Optimization & Gibson Assembly Primer design

Overview

The purpose of this script is to expedite the process of protein DNA sequence optimization and Gibson Primer design for Gibson assembly reactions. The files for the script can be found either here on github or be downloaded individually from the IGEM database below.

Note because IGEM does not support .py file extensions, you will need to remove .txt from the extensions of all of the downloaded files from the links below and append ".py" to the end of (1) seq_analyzer, (2) seq_tools, (3) input_tools, and (4) format_tools (i.e. "seq_tools.txt" becomes "seq_tools.py.") No further formatting is needed for the github link download.

Input File Formatting

In order to run the program, the user has to provide a text file containing the sequences in need of optimization and/or primer design. Each sequence should be included on a separate line; any blank newline entries and spaces will result in a processing error. The sequences should also contain no other characters other than that representing nucleotides (A,T,G,C) or amino acids (G,A,L,M,F,W,K,Q,E,S,P,V,I,C,Y,H,R,N,D,T,*).

For protein optimization, sequences in the input file are not limited to only amino acid or nucleotide sequences--the user can input either and the program will recognize the sequence type and process it accordingly. This is limited however in cases where the only amino acids in the sequence are alanine, threonine, cysteine, and glycine, since the single letter code for each amino acid is also found in the single letter nucleotide representations. For these cases, the program will be unable to distinguish between an amino acid or nucloetide sequence--this however can be corrected by putting a "*" at the end of an amino acid sequence.
  1. First sequence contains at least 50nt of the 3' end of the backbone where the 5' end of the first fragment will join to
  2. N number of sequences to be assembled ...
  3. Last sequence contains the 5' end of the backbone where the the 3' end of the last fragment will join to This is illustrated in the following figure, where the first sequence in the file is "Backbone front", last sequence is "Backbone rear", and the middle sequences in the file would be the N fragments listed in order of assembly.


Sample input and output files are also included in the repository and are listed below:

Input:

Output:

Once the user input files have been formatted properly, the script can be run.

Running the Program

Before the script can be run, make sure you have downloaded the following files to a directory of your choosing:
  • seq_analyzer.py
  • seq_tools.py
  • input_tools.py
  • format_tools.py
  • Codon_Lib
  • NT_Lib
  • restriction_enzymes

These files contain the functions needed by the main script to run the algorithms, and also contain libraries for the program to read from that contain codon and nucleotide pairing maps, and restriction site sequences.

Additionally, the user should download

which are codon frequency use tables for their respective organisms. If a different codon table is desired, the user can create a textfile with each row (codon) arranged as such:

3nt_codon, \t, single_letter_aminoacid_abbreviation, \t, frequency, #/1000



Example:

GCT A 0.16 15.34 GCC A 0.27 25.51 GCA A 0.21 20.28 GCG A 0.36 33.66 …


To run the script, you will need to open terminal or command prompt on your computer. For windows, press Windows+R, and then type “cmd” into the run bar, and hit enter. On mac, hit Command+Space, type “terminal” and then hit enter. This should open a command line interface in which the program can be run.
Commands are input into the program in the following format:
python seq_analyzer.py -if sequence_file -m (g)ibson/(p)rotoptimization -c organism_codon_file -o output_file -t speed
			

in which the arguments are:
  • -if = input filename, note the input file should be in the same directory
  • -m = mode, p for protein optimization, and g for gibson primer design
  • -c = organism codon frequency usage
  • -o = output filename
  • -t = speed at which you want to run the program–a larger number is 0.

The required arguments are -if, -m, and -c. If no output filename is designated, the output file will be named after the input file with “.output” concatenated on the end. if no speed is designated, the default speed is 1 second between each step.
Example:
python seq_analyzer.py -if sequence_file -m g -c ecoli -o output -t 10
			python seq_analyzer.py -if sequence_file -m g -c ecoli
			

Both commands above do the same thing, except the first command designates an output file called output and makes the operation run at 10 seconds between each step.
The script will then run and depending on the operation selected ask for appropriate user input and at the end output a text file with the results.