Team:Sydney Australia/ProteinModel1

EtnR1: ProtParam – What does the primary structure tell us?

  

The ExPaSy ProtParam tool was used to derive basic information concerning EtnR1 based on its amino acid sequence.
It was calculated that:
      • Number of amino acids: 580
      • Molecular weight: 63306.7
      • Theoretical pI: 5.64
It was also found that EtnR1 has an instability index of 38.13, classifying the protein as stable.

EtnR1: BLAST analysis – Are there any known close homologues?


The NCBI BLAST tool was used to conduct a protein BLAST (Basic Local Alignment Search Tool) to identify proteins with close sequence similarity to the EtnR1 sequence. The SmartBLAST feature selects ‘the three best matches in the sequence database together with the two best matches from well-studied reference species, showing phylogenetic relationships based on multiple sequence alignment and conserved protein domains.’
In the case of EtnR1, the top hits were Mycobacterium CdaR family transcriptional regulators, though EtnR1 also showed minimal relationship to the E. Coli. CdaR which has been more thoroughly characterised. Figure 1 shows the phylogenetic tree constructed based on sequence similarities and conserved domains. The individual alignments can be found at the bottom of the page. >

Other candidates shown are regulatory protein [Streptomyces coelicolor A3(2)] (accession: NP_629806.1 GI:21224027), carbohydrate diacid regulon transcriptional regulator; autoregulator [Escherichia coli str. K-12 substr. MG1655] (accession: NP_414704.4, GI: 90111093), CdaR family transcriptional regulator [Mycobacterium tusciae] (accession: WP_006247394.1, GI: 493289684), CdaR family transcriptional regulator [Mycobacterium sp. JS623] (accession: WP_015305844.1, GI: 505118742) and CdaR family transcriptional regulator [Mycobacterium rhodesiae] (accession: WP_005148475.1, GI: 491290459).
The BLAST analysis also revealed a predicted PucR C-terminal helix-turn-helix domain (HTH-30) at the C-terminal end (Figure 2).

It is often found in PucR-like transcriptional regulators. This HTH-30 domain is likely to be where EtnR1 binds to DNA.

Figure 1. Phylogenetic tree produced by SmartBLAST with query: CdaR family transcriptional regulator [Mycobacterium chubuense] (accession: WP_014805817.1, GI:504618715).

Figure 2. Snapshot of BLAST results, showing putative HTH-30 domain at the C-terminal end of EtnR1.

EtnR1: PsiPred – Secondary structure prediction


The UCL Bioinformatics PsiPred tool was used to predict the secondary structure of EtnR1 based on its amino acid sequence (Figure 3). Importantly, the four helices at the C-terminus show high confidence of prediction, and thus is very likely that the putative HTH-30 domain is present. Many of the other areas of secondary structure also show high levels of confidence.

Figure 3. Secondary structure prediction using PsiPred for EtnR1 based on amino acid sequence.

EtnR1: SwissModel Homology Modelling


The SWISS-MODEL server [1-4] is an automated service that selects the best matching template to a submitted amino acid sequence, performs the alignment and builds a 3D model. The most important limiting factor of the quality of the model is generally the matching of the query sequence with an appropriate template. SwissProt threaded amino acids 1 to 313 of EtnR1 onto a selected template protein: phosphodiesterase 5A GAF domain. It is a regulatory domain of the human PDE5. Notably, GAF domains are commonly found at the N-terminus in CdaR family transcriptional regulators, along with the HTH-30 domain at the C-terminus. Based on the structure of the template, a homology model was constructed for this region encompassing residues 1-313 (Figure 4).

Figure 4. Homology model of residues 1-313 of EtnR1 based on phosphodiesterase 5A GAF domain.

Residues 313 to 580 were threaded onto a different template protein: regulator of polyketide synthase expression from Bifidobacterium adolescentis. This protein is a transcriptional regulator. Based on the structure of the template, a homology model was constructed for this region encompassing residues 313-580 (Figure 5). The HTH-30 domain at the C-terminus can be visualised in this model.

Figure 5. Homology model of residues 313 to 580 of EtnR1 based on regulator of polyketide synthase expression BAD_0249 from Bifidobacterium adolescentis. HTH-30 at C-terminus is shown in cyan.

CdaR family transcriptional regulator [Mycobacterium rhodesiae]
Sequence ID: WP_005148475.1 Length: 580 Number of Matches: 1
Related Information
Identical Proteins-Identical proteins to WP_005148475.1
Range 1: 1 to 580

EtnR1: BLAST alignments with three best matches and from well-studied reference species


CdaR family transcriptional regulator [Mycobacterium rhodesiae]
Sequence ID: WP_005148475.1 Length: 580 Number of Matches: 1
Related Information
Identical Proteins-Identical proteins to WP_005148475.1
Range 1: 1 to 580

Score Expect Method Identities Positives Gaps
1140 bits(2948) 0.0() Compositional matrix adjust. 560/580(97%) 566/580(97%) 0/580(0%)
Query Sequence
Query 1 - 60

Sbjct 1 - 60
MTATSDVAHTETLVELREQLSNLQGLLMLAMLMTQSSDENKIVQLSTTSLPAFYRCPFVG
MTATSDVAHTETLVELREQLSNLQGLLMLAMLMTQS DENKI+QLSTTSLPAFYRCPFVG
MTATSDVAHTETLVELREQLSNLQGLLMLAMLMTQSGDENKIIQLSTTSLPAFYRCPFVG
Query 61- 120

Sbjct 61- 120
IYLNDGGWQKHLGARVDYSAAAEVDAQIATLGPSGGDLALGRYARCVALPLRGLDAHIGF
IYL DGGWQK LGARVDYSA AE+DA IATLGPSGG L LGRYA CVALPLRGLDAHIGF
IYLTDGGWQKALGARVDYSATAELDASIATLGPSGGVLTLGRYAWCVALPLRGLDAHIGF
Query 121- 180

Sbjct 121- 180
FIVASDEEPSVGEQFLLRVLVQQTGVALANARLHRKEQASTEALRDSNNALAESISALEN
FIVASDEEPSVGEQFLLRVLVQQTG ALANARLHRKEQASTEALRDSN ALAES+SALEN
FIVASDEEPSVGEQFLLRVLVQQTGAALANARLHRKEQASTEALRDSNIALAESVSALEN
Query 181- 240

Sbjct 181- 240
AARIHARLTEVAAKGNGEDGIATALHELTGLSVAIEDRFGNLRAWAGPDCPEPYPKDDAN
AARIHARLTEVAAKGNGE+GIA ALHELTGLSVAIEDRFGNLRAWAGPDCPEPYPKD+AN
AARIHARLTEVAAKGNGEEGIAIALHELTGLSVAIEDRFGNLRAWAGPDCPEPYPKDNAN
Query 241- 300

Sbjct 241- 300
AREAMLQRCIRAGEPIRHAGRLSAVANPRVDIVGVVSLIDPDEGGGEQAKVALEHGTTIL
AREAMLQRCIRAGEPIRHAGRLSAVANPRVDIVGVVSLIDPDEGGGEQAKVALEHGTT+L
AREAMLQRCIRAGEPIRHAGRLSAVANPRVDIVGVVSLIDPDEGGGEQAKVALEHGTTVL
Query 301- 360

Sbjct 301- 360
AMELARLRSLAEAELRLRRDLVEEVLLGTDDESALARAEALGHDLGTPHRVVIVESEGRC
AMELARLRSLAEAELRLRRDLVEEVLLGTDDESALARAEALGHDLGTPHRVVIVESEGRC

AMELARLRSLAEAELRLRRDLVEEVLLGTDDESALARAEALGHDLGTPHRVVIVESEGRC
Query 361- 420

Sbjct 361- 420
ADMEKFFHGVRSAARHAHMGTLVVARSNTVVILSDADMNRDRFVSALVSQLGSDDCRVGV
ADMEKFFHGVR AARHAHMGTLVVARSNTVVILSDADMNRDRFVSALVS LGSDDCRVGV
ADMEKFFHGVRRAARHAHMGTLVVARSNTVVILSDADMNRDRFVSALVSHLGSDDCRVGV
Query 421- 480

Sbjct 421- 480
GGWCDRPRHLPRSYREAQLALKMQRRVGSRDPAPVTFYDELGVYRILAEVENQDSIERFV
GGWCDRPRHLPRSYREAQLALKMQRRVGSRDPA VTFYDELGVYRILAEVENQDSIERFV
GGWCDRPRHLPRSYREAQLALKMQRRVGSRDPAAVTFYDELGVYRILAEVENQDSIERFV
Query 481- 540

Sbjct 481- 540
RQWLGPLLDYDAAKQSQLVATLSAYLECGGHHDATTAAIFVHRSTLKYRLSRIRTLLGLD
RQWLGPLLDYDAAKQSQLVATLSAYLECGGHHDATTAAIFVHRSTLKYRLSRIRTLLGLD
RQWLGPLLDYDAAKQSQLVATLSAYLECGGHHDATTAAIFVHRSTLKYRLSRIRTLLGLD
Query 541 -580

Sbjct 541 - 580
VNDPDVRFNLQMATRAWKTLEHLGSGDGHMSHQVEIPPTL
VNDPDVRFNLQMATRAWKTLEHLGSGDGHMSHQVEIPPTL
VNDPDVRFNLQMATRAWKTLEHLGSGDGHMSHQVEIPPTL

CdaR family transcriptional regulator [Mycobacterium sp. JS623]
Sequence ID: WP_015305844.1Length: 580Number of Matches: 1
Related Information
Identical Proteins-Identical proteins to WP_015305844.1
Range 1: 3 to 580

Score Expect Method Identities Positives Gaps
856 bits(2212) 0.0() Compositional matrix adjust. 424/579(73%) 493/579(85%) 1/579(0%)
Query Sequence
Query 2 - 61

Sbjct 3 - 61
TATSDVAHTETLVELREQLSNLQGLLMLAMLMTQSSDENKIVQLSTTSLPAFYRCPFVGI
A DV+ TE+ +ELREQ+SNLQGLLML++LMTQS DENKI+QLS TS+P+FYRCPFVGI
VAQEDVS-TESYIELREQISNLQGLLMLSLLMTQSGDENKIIQLSITSVPSFYRCPFVGI
Query 62- 121

Sbjct 62- 121
YLNDGGWQKHLGARVDYSAAAEVDAQIATLGPSGGDLALGRYARCVALPLRGLDAHIGFF
+LND GWQ L E++ Q+ LGPSGG L RY+ CVALPLRGL+AHIG+F
HLNDRGWQNPLDPHDVSLERIELEPQLKALGPSGGVLNFTRYSWCVALPLRGLEAHIGYF
Query 122- 181

Sbjct 122- 181
IVASDEEPSVGEQFLLRVLVQQTGVALANARLHRKEQASTEALRDSNNALAESISALENA
+VA++ EPS GEQFL+RVL QQTGVALANARLHRKEQASTEALR SN ALAES++ALE A
VVAAEAEPSTGEQFLIRVLAQQTGVALANARLHRKEQASTEALRRSNVALAESVTALEYA
Query 182- 241

Sbjct 182- 241
ARIHARLTEVAAKGNGEDGIATALHELTGLSVAIEDRFGNLRAWAGPDCPEPYPKDDANA
A IHAR TE+A KG GE+GIATALHELTGL VAIEDRFGNLRAWAGPDCP+PYPKDD
ANIHARFTEIATKGQGEEGIATALHELTGLPVAIEDRFGNLRAWAGPDCPDPYPKDDPAT
Query 242- 301

Sbjct 242- 301
REAMLQRCIRAGEPIRHAGRLSAVANPRVDIVGVVSLIDPDEGGGEQAKVALEHGTTILA
REAMLQRC++AGEPIR GRLSAVANPRVDI+GV+SLIDP G+QA+VALEHGTT+LA
REAMLQRCVQAGEPIRQGGRLSAVANPRVDIMGVLSLIDPQAVAGDQAQVALEHGTTVLA
Query 302- 361

Sbjct 302- 361
MELARLRSLAEAELRLRRDLVEEVLLGTDDESALARAEALGHDLGTPHRVVIVESEGRCA
MELARLRSLAEAELRLRRDLVEE+LLGT DESALARAEALGHDLG HRV+IVE +GR A

MELARLRSLAEAELRLRRDLVEELLLGTGDESALARAEALGHDLGRCHRVLIVEPQGRTA
Query 362- 421

Sbjct 362- 421
DMEKFFHGVRSAARHAHMGTLVVARSNTVVILSDADMNRDRFVSALVSQLGSDDCRVGVG
DM+KFFH VR AAR+A +G+L+VAR++TVVILSDA+++R++F+SA+ + +G D+ RVGVG
DMDKFFHAVRRAARNAQLGSLIVARASTVVILSDAEVDREKFLSAISTSVGDDNFRVGVG
Query 422- 481

Sbjct 422- 481
GWCDRPRHLPRSYREAQLALKMQRRVGSRDPAPVTFYDELGVYRILAEVENQDSIERFVR
GWCDRP PRSY EAQLALKMQRR G+ A V FYDELGVYRILAEVENQ SIE FVR
GWCDRPEDFPRSYHEAQLALKMQRRSGTATAAGVIFYDELGVYRILAEVENQQSIESFVR
Query 482- 541

Sbjct 482- 541
QWLGPLLDYDAAKQSQLVATLSAYLECGGHHDATTAAIFVHRSTLKYRLSRIRTLLGLDV
QWLGPLLDYDAAK SQ+VATL+ YL+CGGH+D TTAA+++HRSTLKYRLSRIR LLG+D+
QWLGPLLDYDAAKGSQMVATLAGYLQCGGHYDTTTAALYIHRSTLKYRLSRIRDLLGIDI
Query 542 -580

Sbjct 542 - 580
NDPDVRFNLQMATRAWKTLEHLGSGDGHMSHQVEIPPTL
NDP+ RFNL++A RAW TLE L +G GHMS E+ P++
NDPEARFNLELAARAWGTLEELAAGQGHMSPPAEVDPSV

carbohydrate diacid regulon transcriptional regulator; autoregulator [Escherichia coli str. K-12 substr. MG1655]
Sequence ID: NP_414704.4
Related Information
Gene-associated gene details
Identical Proteins-Identical proteins to WP_000929443.1
Range 1: 87 to 378

Score Expect Method Identities Positives Gaps
75.9 bits(185) 0.0() Compositional matrix adjust. 85/305(28%) 143/305(46%) 35/305(11%)
Query Sequence
Query 271 - 330

Sbjct 87 - 144
DIVGVVSLIDPDEGGGEQAKVALEHGTTILAMELARLRSLAEAELRLRRDLVEEVLLGTD
+IVGV+ L E + ++ T + +E +RL L + RLR +LV ++ +
EIVGVIGLTGEPENLRKYGELVCM--TAEMMLEQSRLMHLLAQDSRLREELVMNLIQAEE
Query 331-382

Sbjct 145- 202
DESALAR-AEALGHDLGTPHRVVIVESEG-------RCADMEKFFHGVRSAARHAHMGTL
+ AL A+ LG DL P V IVE + A++++ + + + R+ +
NTPALTEWAQRLGIDLNQPRVVAIVEVDSGQLGVDSAMAELQQLQNALTTPERNNLVA--
Query 383- 431

Sbjct 203-261
VVARSNTVVILS--------DADMNRDRFVSALVSQL---GSDDCRVGVGGWCDRPRHLP
+V+ + VV+ DA+ +R R V L++++ G RV +G + P +
IVSLTEMVVLKPALNSFGRWDAEDHRKR-VEQLITRMKEYGQLRFRVSLGNYFTGPGSIA
Query 432 - 489

Sbjct 262 - 313
RSYREAQ--LALKMQRRVGSRDPAPVTFYDELGVYRILAEVENQDSIERFVRQWLGPLLD
RSYR A+ + + QR SR FY +L + +L + R PL
RSYRTAKTTMVVGKQRMPESR----CYFYQDLMLPVLLDSLRGDWQANELAR----PLAR
Query 490 - 548

Sbjct 314-373
YDAAKQSQLVA-TLSAYLECGGHHDATTAAIFVHRSTLKYRLSRIRTLLGLDVNDPDVRF
+ L+ TL+A+ AT+ A+F+HR+TL+YRL+RI L GLD+ + D R
LKTMDNNGLLRRTLAAWFRHNVQPLATSKALFIHRNTLEYRLNRISELTGLDLGNFDDRL
Query 549-553

Sbjct 374-378
NLQMA
L +A

LLYVA

regulatory protein [Streptomyces coelicolor A3(2)]
Sequence ID: NP_629806.1
Related Information
Gene-associated gene details
Identical Proteins-Identical proteins to WP_011030383.1
Range 1: 430 to 548

Score Expect Method Identities Positives Gaps
71.2 bits(173) 1e-11() Compositional matrix adjust. 43/124(35%) 69/124(55%) 5/124(4%)
Query Sequence
Query 433 - 492

Sbjct 430 - 484
SYREAQLALKMQRRVGSRDPAPVTFYDELGVYRILAEVENQDSIERFVRQWLGPLLDYDA
+Y++AQ AL + RR G ++ + +L + + D++ F L L D+DA
AYKQAQQALSVARRRGRV----CVEHEHVAAGSVLPLLAD-DAVRAFADGLLRALRDHDA
Query 393-552

Sbjct 485- 544
AKQSQLVATLSAYLECGGHHDATTAAIFVHRSTLKYRLSRIRTLLGLDVNDPDVRFNLQM
+ LVA++ A+L G DA A + VHR TL+YR+ R+ +LG ++DPDVR L +
TGRGDLVASVRAWLSRHGQWDAAAADLGVHRHTLRYRMRRVEEILGRSLDDPDVRMELWL
Query 553- 556

Sbjct 545-548
ATRA
A +A
ALKA

For reference: EtnR1 Amino Acid Sequence


MTATSDVAHTETLVELREQLSNLQGLLMLAMLMTQSSDENKIVQLSTTSLPAFYRCPFVGIYLN
DGGWQKHLGARVDYSAAAEVDAQIATLGPSGGDLALGRYARCVALPLRGLDAHIGFFIVASDEE
PSVGEQFLLRVLVQQTGVALANARLHRKEQASTEALRDSNNALAESISALENAARIHARLTEVA
AKGNGEDGIATALHELTGLSVAIEDRFGNLRAWAGPDCPEPYPKDDANAREAMLQRCIRAGEPI
RHAGRLSAVANPRVDIVGVVSLIDPDEGGGEQAKVALEHGTTILAMELARLRSLAEAELRLRRD
LVEEVLLGTDDESALARAEALGHDLGTPHRVVIVESEGRCADMEKFFHGVRSAARHAHMGTLVV
ARSNTVVILSDADMNRDRFVSALVSQLGSDDCRVGVGGWCDRPRHLPRSYREAQLALKMQRRVG
SRDPAPVTFYDELGVYRILAEVENQDSIERFVRQWLGPLLDYDAAKQSQLVATLSAYLECGGHH
DATTAAIFVHRSTLKYRLSRIRTLLGLDVNDPDVRFNLQMATRAWKTLEHLGSGDGHMSHQVEI
PPTL

References


1. Marco Biasini, Stefan Bienert, Andrew Waterhouse, Konstantin Arnold, Gabriel Studer, Tobias Schmidt, Florian Kiefer, Tiziano Gallo Cassarino, Martino Bertoni, Lorenza Bordoli, Torsten Schwede. (2014). SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Research; (1 July 2014) 42 (W1): W252-W258; doi: 10.1093/nar/gku340.
2. Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics, 22,195-201.
3. Kiefer F, Arnold K, Künzli M, Bordoli L, Schwede T (2009). The SWISS-MODEL Repository and associated resources. Nucleic Acids Research. 37, D387-D392.
4. Guex, N., Peitsch, M.C., Schwede, T. (2009). Automated comparative protein structure modeling with SWISS-MODEL and Swiss-PdbViewer: A historical perspective.Electrophoresis, 30(S1), S162-S173.

School of Life and Environmental Sciences
The University of Sydney
City Road, Darlington
2006, New South Wales, Sydney, Australia