Data Processing Software
We have developed a web application with the aim of being a powerful tool as part of our project. It has been designed as a networking and affordable space that complement the objective of our project.
HYPE-IT Software gather many sections where users could take advantage of all the tools that integrate it. Those sections are a forum, a blog, a news feed, a complete targets database, our scoring system, a gene viewer and a hierarchy of groups. Therefore, we could say we have created a complete social network.
We have used MEAN framework (mongo, express, angular, node) to carry out the web development. We have chosen this technology and a BLL logic (Business Logic Layer), because it gathers the four most web development innovative technologies, such as Angular, that has the support of Alphabet, the Google searcher owner, or mongoDB, a NoSQL database no relational scalable database benchmark . This framework set up in 2013 and has overtaken his main rivals, as php o .net.
The business logic layer facilitates implementation of the tool in different environments. You just need to change a file to modify the persistence of the software, you can persist in oracle, in mongo or whatever you want.
Another bonus that makes our software greater than others is the use of NPM (node package manager) and Bower web package manager. Both are package managers that allow us to install code packages and certificate tools by a console that make easier the activity of development. Programming an application by components as a puzzle allows us to outsource any developed function. In our case, an example could be the scoring system, that could be shared with the developer community as a certificated package.
As a future line, we would create a package that integrates scoring system code. It would allow any development team to integrate our scoring in a easy way, installing only one component, and being always updated thanks to package manager. For the moment, we have splitted application layers and processes engine in front-end routes and API routes, we have separated the front-end back-end applications as two complementary but independent parts, We have fitted out an API public rout with the intention of consulting our scoring system. Therefore, any web server can make requests to our system score and get the results by sending a data packet in a given way to a web address established API.
Other package manager advantage is the certificated components that we have used in our development software, such as ‘passport’, a component to log in in a safety way creating cookies in web browser, or ‘pdfmake’, a component to create packages. Those open access components are highly tested by developers and ease the software development.
Finally, we have developed our site with cloudno.de hosting, a service created with the intention of promote those kinds of technologies. It is free access, so that maintenance costs would be zero.
To developw the frontend side, we bought a template developed by stepofweb (http://www.stepofweb.com/). It guarantees a friendly user appearance, a complete responsive and fluid design and compatible with IE9, IE10, IE11, Chrome, Firefox, Opera and Safari.
All in all we have created a web application with the most innovative technologies, integration in an easy way all HYPE-IT tools, that encourages ideas exchange about plant synthetic biology in a community workspace.
Certified Components used in Web Development
Our maximum achievement referring to plant’s synthetic biology is our scoring system. This utility receives a gene and returns the possible targets scored and ordered, the standards our scoring system works with are:
- The system is designed to return the primers to amplify the target with the overhangs to be used as standard GoldenBraid (GB) parts. GB is one of the plant assembly systems of reference validated by iGEM. It was developed by Diego Orzáez team at IBMCP in Politecnic University of Valencia. Our aim is to promote the use of this standard by using it in the Software.
- The scoring system can process as an entry field one gene in FASTA format or a base text file.
Our scoring system is based in five criteria; for each one of those criteria, independent literature-based algorithms evaluate different aspects of the DNA string. Different papers and empiric results from our own and associated (UPV) researches have been used. The criteria are better explained in the scoring section. In order to know how close our results were to the optimum ones, we have compared our targets to the ones obtained for the same sequence by the competence, and our results always matched the ones located in the 5 best results using their algorithms.
Why did we always obtained results among the 5 best matching sequences and not the first one? In our case the search is not only in one PAM NGG, but also includes a promotor G 20 H GG specific for plants, so it differs from the rest of targets providers. That is also the reason why our search has different parameters for the general score, giving “negative points” for some characteristics that provide better score in other’s algorithms.
Our web application was created as a supporting tool for the labcase and the database, also, a sharing information platform which allows communication between users has been created with the aim of sharing knowledge and improve the quality of the information, so that its functionality is not reduced to the initial consultation software. For this purpose, we have also developed API routes for the external use of the scoring system, meaning that other platforms can use this element without log in our web application. If a thirds’ application user communicates by a post red protocol by sending a DNA file, thus the application will respond with a json object with the possible targets ordered by the punctuation assigned by the five algorithms of the system. Those users will not have access to the algorithms, but they will to the results. Three other different routes can be used in this communication without the need of being logged, allowing searches by gene name, phenotypic characteristic and/or plant name.
At the end of this wiki page, you will find a small user’s manual for the API use with an example of its usage. This guide has been developed with an example of the scoring system, but the other three routes are also completely functional and have similarities in its way of usage.
In order to make a user-friendly software, the service has been externalized, so a 500+ pages template specially developed by company expertized in creating friendly frontends has been used, dynamizing the user environment. This specific template has been downloaded by 4000 worldwide, which means that is generally considered as good supporting material for web development. It was designed one year ago and the last update was uploaded 3 months ago. This template ensures correct compatibility with IE, Chrome, Firefox, Safari and Opera.Some development scripts have been also included by the company, increasing even more the dynamism. This scripts include jquery, bootstrap, parallax, phpmailer and revolution slider.
Valencia Polytechnic university Quality Software Professors from the Computation & IT department have collaborated with our team for the application of some usability standards based on similar commercial tools and ISO/IEC 9216 usability controls. The fact of using MEAN technology with REST enrouting supports us in usability, as by using this enrouting, the differentiation from what the user is doing in which moment is stressed out, allowing that future incidences to be processed by user’s assistance online center easily with a screenshot or a programme route. So the incidence could be detected and situated.
The code is documented by Airbnb style guides, a mean web standard which allows easy code maintenance for developers.
It will only be considered as a possible target the DNA sequence on the gene which meets the both following conditions:
- The target ends with the PAM -NGG
- The target starts with a G 20 nucleotides before de PAM
Thus, a good target will be the one which matches with the search of a G-20-GG sequence. For one of the score criteria it is mandatory to know which nucleotide follows the PAM, so every sequence matching with the pattern GXXXXXXXXXXXXXXXXXXXX will be stocked in a list. Moreover, those sequences located so close to the start or the end of the sequence whose primers cannot be designed will be automatically discarded in this step. Once all possible targets are found, they will be processed with the scoring system, whose source code is shown below:
//1. PAM criteria 1.0
For all the possible targets, if the first nucleotide after finishing the PAM NGG is a T, we assign 10 point and a 1 in the score around, if not, 0 points are assigned.
//1. PAM criteria 2.0
For all the possible targets, if they are at the beginning of the gene, more points will be obtained, if they are found in the first tenth of the gene, they are going to obtain 50 points in the global score and 1 position point, if they are found in the first quarter part but not in the first tenth, half of the points will be assigned. Other positions will not receive points.
//1. PAM criteria 3.0
If in the 21st position of the sequence GXXXXXXXXXXXXXXXXXXXXGGX a C is found, 60 points will be given and 1 PAM score point, half of the points will be given if instead of a C is a T and if it is an G, 20 for the global score and 0,5 for the PAM score. In the case of A, only 10 points will be assigned in the global score and no points in the PAM score.
So the order remains the following:
G-19N-CGG > G-19N-TGG > G-19N-GGG > G-19N-AGG
//2. Structural criteria 1.0
In this criteria we measure the amount of G an Cs in the target, if that amount is between 7 an 12 bases, 10 points are assigned in the global score and 1 point for the concentration, if the concentration is below 7 or more than 18, a penalty of -180 points is applied and 0 in concentation. Concentration between 12 an 18 the value by default is 0.5
//2. Structural criteria 2.0
Another criteria involves the possible fold in the DNA strand by base complementation; if we have a palindromic complementary strand, there is a chance of unwanted fold, so it needs to be penalized. -10 points are assigned to the ones that show complementarity and when we finish the process, we punctuate the symmetry between 0 and 1
One of the criteria which has not be taken into account are the possible off targets and the complete analysis of the genome. Initially we developed a web service which found all possible off targets of a genome given its sequence. We finally discarded it because of the following reasons:
- In order to find all the off targets of a gene, it is necessary to endow the database with the plants’ genomes. These documents have a huge size and will increase exponentially the weight of the database.
- The designed algorithm needed around three hours to find every possible off targets because the computing level of a free online server is very small.
- We tried to outsource the off target search service by using known computer centers such as NCBI and its API to perform Blast alignments, but the computing time was still high and this library does not count with every plant genomes.
- The off targets’ search does not take into account the chromatin state, so in diploid systems or with polyploidy we could find false off targets.
That is why we decided to analyse only the gene, and our empirical tests showed the good level of the sofware and the scoring system in comparison with other tools.
As it has been introduced before, we have developed an API so that more people can easily use our scoring system and our database without need to share our algorithms.
First of all, we have to define what an API is; API (Application Programming Interface) is currently understood as the set of subroutines, functions and procedures that a library offers for being used by another software as an abstraction layer.
Our API routes are the following:
In the first API route we make our scoring system disposable for other softwares, in the other three, we share our database, thus we are going to explain how to use an API and how we have create it for allowing its use to users with limited programming skills.
As it is known, a web service based on http allows petition communication through a minimum of four routes: ‘get’, ‘post’, `put` and ‘delete’. Each one has a different internal mechanism. Get routes allow information recovery, i.e. our web explorer does something similar to http.get(‘www.google.es’) and receives the html that can be visualized in the screen. for sending any message to the web server, we need to use the other instructions, generally, the API use the ‘post’ command, which allows creation and query, but unfortunately, a url-based explorer cannot make a petition post, reason why we chose to create both; the ‘post’ route for future users and a “get2 one that can be read by the explorer, but at an informatic level it is not correctly interoperable by other applications.
In the following web: http://hypeit.cloudno.de/api/scoringsystem two backend listners have been activated for receiving post and get petitions. However, as the API needs to be fooled by a protocol that is not the expected one, we have to work on the route for observe the results, in this case, it will be [http://hypeit.cloudno.de/api/scoringsystem?dna=”” path:http://hypeit.cloudno.de/api/scoringsystem?dna=””] (target DNA will be placed in between the inverted commas) internally, the software will recognize ‘adn=’ by and enrouting utility and will move the attribute assigned to the “equal” sign to the processing system.
This is an example with the results:
Other API routes that can be used and are refered to the searches made in the database are:
So, in order to close this section, we would like to emphasize that the only requirement for the usage of this routes would be to send an object with an atribute called ‘dna’ whose declaration would be the nucleotide succession http.post(‘http://hypeit.cloudno.de/api/scoringsystem’, (dna : “ACAAGATGC….”)) on in case of a database search, a text file with the elements to search for.
Often, gene related information is available on the Internet, but people don’t always know where to look or how to search efficiently, and almost every time it is so difficult to interpret because it is not clear the function of these genes. Editing them usually leads to non desired modification. Plant proteins usually are involved in many metabolic processes simultaneously, so modifications in a particular protein could lead in a non viable plant. In other cases, those modifications lead to a fruitless variety, which many times has not an agricultural benefit. So that, many times is so difficult to know which gene a plant breeder must edit despite he knows what phenotypic trait he wants to obtain.
In HYPE-IT, we have worked hard looking for genes which knockout leads to an interesting phenotypic trait. Web pages of scientific articles are full of papers where gene functions are identified, but not always are identified by knocking them out. We have only selected those articles where they have done the gene silencing, either by RNAi mediated silencing or by CRISPR/Cas9 editing, and they obtained a viable fruited plants. This allow us to avoid transgenesis because we don’t need to insert exogenous genes to obtain an enhanced plant variety.
Our database gather more than 20 different genes, which knockout leads to many different improved characteristics. All these genes are well referenced, so has been demonstrated plant viability in all the cases. However, there are many proteins that are homologous between them, so we can admit that they have the same function in other plant species. After doing Blastp exams, we finally obtained more than 200 targeting genes. This is a huge Database with a high interest for plant breeders and seedbeds.
Some gene examples of our Database are:
- Ga20 oxidase, that leads to a smaller phenotype in maize or rice. All the energy that the modified plants would have used growing up, they would harness it increasing their grain properties and production.
- TFL (terminal flower), is a key regulator of delaying flowering and regulating plant growth. Its knockdown leads to more flowering varieties.
- ACS4 is a protein involved in the route of ethylene synthesis. It catalyzes the synthesis of 1-aminocyclopropane-1-carboxylic acid (ACC) from S-Adenosyl methionine. Its knockdown leads to andromonoecy varieties.
HYPE-IT Database also include important gene related information, such as the name of the gene and the protein targeted, the paper we have based on or the NCBI accession number. It allows us to organise information in a structured way, in addition to have it interconnected with other kind of databases. Mixing our Database and Software, we are able to obtain the optimal gRNA to a specific HYPE-IT Database gene using the scoring system of our Software. The objective is to reduce steps that plant breeders should do if they wanted to enhance a plant variety giving them all sequences they should order to synthesize.
We are very glad to the result of our Database. If you want to access to it, you should sign in our external webpage, located inside HYPE-IT sofware application.
We also include here our database in .xls for anyone who wants to check it:
As a sample of how our Database is:
|Common name||Species||Phenotypic trait||Gene Name||NCBI CDS Accession number||Protein|
|Apple||Malus domestica||delayed ripening||MdACS3||AB243060||1-aminocyclopropane-1-carboxylate synthase|
|Tomato||Solanum lycopersicum||delayed ripening||Solanum lycopersicum 1-aminocyclopropane-1-carboxylate synthase (ACS6), mRNA||NM_001247235.2||1-aminocyclopropane-1-carboxylate synthase|
|Strawberry||Fragaria × ananassa||Flavonoid biosynthesis||Fragaria chiloensis transcription factor (MYB1) mRNA, complete cds||GQ867222.1||A R2R3 MYB transcription factor|
|Tomato||Solanum lycopersicum||Increase of carotenoid and flavonoid levels||Solanum lycopersicum deetiolated1 homolog (Det1), mRNA||NM_001247219.2||light-mediated development protein DET1|
|Orange tree||Citrus sinensis||induced flowering||Citrus sinensis terminal flower (TFL), mRNA||NM_001288919.1||terminal flower (TFL)|
|Maize||Zea mays||semi-dwarf; more grain yield||Zea mays (LOC107521947), mRNA||NM_001321686.1||GA3 oxidase|
|Rice||Oryza sativa||Drought Tolerance||PREDICTED: Oryza sativa Japonica Group E3 ubiquitin-protein ligase SINAT5 (LOC4344172), mRNA.||XM_015789296||E3 ubiquitin-protein ligase SINAT5|
|coffee||Coffea arabica||decaffeinated plants||Coffea arabica CaMXMT1 mRNA for 7-methylxanthine N-methyltransferase, complete cds.||AB048794||RecName: Full=Monomethylxanthine methyltransferase 1; Short=CaMXMT1; AltName: Full=Theobromine synthase 1|
|cotton||Gossypium hirsutum||increased stearic acid content||G.hinsutum mRNA for stearoyl-acyl-carrier protein desaturase||X95988||delta 9 stearoyl-(acyl-carrier protein) desaturase (Gossypium hirsutum)|
|cotton||Gossypium hirsutum||increased oleic acid content||Gossypium hirsutum delta(12)-fatty-acid desaturase FAD2-like (LOC107934594), mRNA.||NM_001327381||delta(12)-fatty-acid desaturase FAD2-like (Gossypium hirsutum)|
|corn||Zea mays||higher levels of amylose||Zea mays amylose extender 1 (ae1), mRNA.||NM_001111846||1,4-α-glucan-branching enzyme 2, chloroplastic/amyloplastic precursor (Zea mays)|
|onion||Allium roylei||reduced levels of tear-inducing lachrymatory factor||Allium roylei lachrymatory factor synthase (LFS) gene, partial cds.||HQ738919||lachrymatory factor synthase, partial (Allium roylei)|
|tomato||Solanum lycopersicum||Parthenocarpic||Solanum lycopersicum chalcone synthase (CHS2) mRNA, complete cds.||HQ008773||chalcone synthase (Solanum lycopersicum)|