Team:Bielefeld-CeBiTec/Results/Library/Sequencing



Library Results

Sequencing

Sequencing


To verify our libraries, we used different sequencing techniques. Thereby it is possible to estimate the actual diversity as well as the distribution of nucleotides at the randomized positions compared to the theoretical library sizes.

Sanger sequencing

Sanger sequencing was applied to check the correctness of assemblies and to examine the appearance of the different chosen bases for positions of the variable regions of our binding proteins, respectively. Plasmids were isolated from 24 colonies with Monobodies and Nanobodies, respectively. This plasmid mix was sequenced in one reaction. Figures 1 and 2 show that these 24 colonies were enough to cover nearly all possible bases of Monobody randomized regions and all Nanobody randomized regions as designed.
Learn more about the used randomized IUPAC nucleotide designation and the encoded amino acids here.

Figure 1.1: Monobody first randomized region of 24 colonies. Top to bottom: Ordered sequence, chromatogram and sequencing result.


Figure 1.2: Monobody second randomized region of 24 colonies. Top to bottom: Ordered sequence, chromatogram and sequencing result.


Figure 2: Nanobody randomized CDR3 of 24 colonies. Top to bottom: Ordered sequence, chromatogram and sequencing result.


NGS sequencing for diversity estimation

About 60,000 Nanobody and 30,000 Monobody plasmids were isolated and submitted to NGS sequencing (MiSeq). We expect and underestimation of the diversity due to different issues. Critical points in the sample preparation were the isolation of equal amounts of plasmids from different clones, the fragmentation of the plasmids and the sequencing itself. The sequencing results revealed an approximate minimal number of different plasmids within our sent-in libraries:

Monobodies: 94 unique fragments out of 164 were detected in 50ng DNA so our plasmid isolation of Monobodies contains over 28,200 different binding proteins.

Nanobodies: 437 unique fragments out of 499 were detected so our plasmid isolation of Nanobodies contains over 27,528 different binding proteins.

While calculating it is important to pay attention to the further loss of variety due to fragmentation for the production of the Nextera libraries, the adapter ligation and sequencing bias. Moreover, not the complete Nextera libraries were sequenced.

Additionally, the distributions of the bases for every position were defined. Results are shown in Figures 3 and 4. This number could be affected by PCR amplification while sequencing (four cycles lead to maximal 32 copies of each fragment). Also the size of the colonies for plasmid isolation influences the multiplicity of each plasmid.
Figure 3: Monobodies: Distribution of bases in randomized regions.

Figure 3: Nanobodies: Distribution of bases in randomized CDR3.