Difference between revisions of "Team:Waterloo/Integrated Practices/OpenScience"

Line 7: Line 7:
 
     </div>
 
     </div>
 
     <div class="wcontent">
 
     <div class="wcontent">
<p>Ana Patricia Balbon, Hannah James, Kruti H. Patel
+
<p><i>Ana Patricia Balbon, Hannah James, Kruti H. Patel</i>
 
<h1>Brick by (Bio)Brick: Examining the discipline of synthetic biology as an open science</h1>
 
<h1>Brick by (Bio)Brick: Examining the discipline of synthetic biology as an open science</h1>
  

Revision as of 23:16, 15 October 2016

Open Science

Ana Patricia Balbon, Hannah James, Kruti H. Patel

Brick by (Bio)Brick: Examining the discipline of synthetic biology as an open science

Research relies on the collection and analysis of data to develop explanations about the phenomena it seeks to represent (Borgman, 2015). Currently, many research disciplines are developing new data sharing and data use practices. One of these disciplines is biology, where novel ideas and practices around data are being established. This paper focuses on data and science within synthetic biology, and specifically on the International Genetically Engineered Machine (iGEM), a leading synthetic biology competition. It begins with a review of open-science as a social and technical movement, then we introduce synthetic biology and the iGEM competition specifically, to show how open-data concepts are being encouraged and meeting resistance in this forum.

Open Science

Within scientific disciplines, open science communicates “data...and everything else...as it happens,” making the entirety of a scientific investigation available (Grand, Wilkinson, Bultitude, Winfield 2012). The aspiration of open science is to “...improve the flow of information, minimize restrictions on the use of intellectual resources, and increase transparency of research practice (Borgman, 2015).” While this concept is aspirational, there are several challenges that arise from this desire to make data and all the surrounding aspects of research open and accessible. Borgman suggests that these challenges and obstacles include, but are not limited to: motivations to create and store data; the usability of stored data for others; the length of time for which these data are stored; funding and research grants; and interdisciplinarity (Borgman, 2015). Thus, the increased desire for the improvement of information flow, reproducibility, and availability of data has driven this development of ‘open scholarship’ (Borgman, 2015).

However, what are the motivations for sharing this data in the first place? These are the four significant rationales associated with sharing research data; “(1) to reproduce data, (2) to make public assets available to the public, (3) to leverage investments in research, (4) to advance research and innovation” (Borgman, 2015). Reproducing data validates research or has the potential to reveal error, fraud or misconduct. The sharing of research data should allow scholars to use others’ work to ask new questions, to advance research and drive innovation. Sharing data also makes researchers accountable - socially - for their data, thus contributing to the goals of open science. In instances where public funding is being utilized to support research projects, the public has a right to know the outcomes of that investment in order to determine whether the investment was beneficial. Overall, open access to data and publications should benefit all readers from scholars to the general public.

Openness in data occurs in different forms with varying rules and regulations. Policies change within disciplines, as well as across broad fields like the sciences and humanities. Although the format will vary along with the data collected and the published work, the principal motivations and desires for open scholarship will not change.

Data in Biology

Data in biology are collected and preserved in a number of forms. Genomics research, for example, often has large, international networks of researchers and collaborators with standards and expectations for data storage and reuse (Turnbaugh et al., 2009), whereas a discipline like marine biology will have a larger variety of data types and storage. Genomics would be considered a big science, “a science that operates on a large international scale by a means of collaboration, data collection, and instrumentation” (Price, 1986). Anything that falls beyond this classification is considered small science. Bigger sciences are often required to share data internationally for collaboration and for research mandates. This requirement encourages a more sophisticated knowledge infrastructure because of the number of parties involved and the communication required. Thus, there is often pre-existing infrastructure in “big science” which supports open science objectives.

Subdisciplines in biology each have a different classification within big science. These disciplines often publish data and discoveries in different manners that can vary by researcher and sub-discipline (Borgman, 2015). For instance, data in ecology will be collected and analyzed in markedly different ways, even within ecology a researcher studying larger effects of climate change on a specific ecosystem would have different data practices than a frog reproduction researcher. In order to develop scientific explanations, standardized forms of data analysis and collection exist throughout the many subdisciplines of biology. Yet there are still many fields in biology that are much smaller, with more individualized projects, that are small science. Some of these fields are emerging to have more standardized elements of data collection and data storage that is moving them towards big science (Müller & Arndt, 2012; Price, 1986). These fields will also have dynamic characteristics and requirements when oriented to openness.

One of these evolving sub-disciplines is synthetic biology. Synthetic biology utilizes the power of genome modification and protein synthesis to create new systems in cells that can change the functionality of the cell. Scientometric analysis by Paul Oldham, a British analyst, reveals the development and divergence of synthetic biology with respect to terms used in the Web of Science, an international data and publication repository. The term synthetic biology is mentioned for the first time by a French scientist in 1912 but then remains dormant until the early 2000s, and not until 2007 are there many citations in the field (Leduc, 1912; Oldham, Hall, & Burton, 2012; Zeleny & Hufford, 1992). Even with the increase in citations in synthetic biology, in 2007 there are just over 200 articles, significantly fewer than other developing fields such as nanotechnology. The analysis also shows the global nature of the discipline, with researchers scattered across the globe (Oldham et al., 2012). However, with the development of data repositories, increased research interest, and standardization, synthetic biology is developing many characteristics of a big science (Borgman, 2015; Endy, 2005). As such it forms a crucial case-study in evolving data methods in an emerging science.

The overall objective of synthetic biology is the engineering of complex biological systems which create novel cell functions such as protein synthesis (Osbourn, O’Maille, Rosser, & Lindsey, 2012; Shetty, Endy, & Jr, 2008). The approach of synthetic biologists typically proceeds by inserting and synthesizing new DNA into a cell to encode a protein with a novel function. The complexities of engineering new molecules with unique functions raise a variety of issues both from biological and engineering standpoints. The storage and dissemination of the data behind these synthetic approaches parts has, as each discipline does, its own challenges and motivations behind the development of open science.

As an example of open-data developments, there is a nascent data repository in synthetic biology called the Registry of Standard Biological Parts (RSBP) that was initially developed by researchers at M.I.T. and others to store the elements that make up complex systems in synthetic biology (Baker et al., 2006). The initial step of the creation of a data repository is significant in the development of an open science (Borgman, 2015). Submissions to this repository initiate the expectation of standardization for data which, with the concerns iof reproducibility in science, will motivate sound scientific practices.

Synthetic biology is founded on the conceptual framework of distinct biological parts which are “bricks” that can be assembled into functioning biological systems. As a discipline, the ultimate goal is the assembly of these biological parts into complex, engineered systems (Purnick & Weiss, 2009). This standardization of individual blocks provides engineers with reliable blocks with standard function so that they can fabricate these systems without the concern of having malfunctioning “bricks”. It has been proposed that these bricks should undergo systematic testing, like other non-biological engineered systems (Canton, Labno, & Endy, 2008). Testing would validate individual “bricks” and their attributes, and allow engineers to reliably predict function of larger, composite, system. These assembled items (or “BioBricks”) are beginning to be accumulated into the previously mentioned RSBP library (Shetty et al., 2008).

iGEM Case Study

Oldham’s indication of the increasing size of the discipline of synthetic biology indicates the importance of instilling good research practices and capabilities, of openness and collaboration. This is especially helpful to young researchers as they become the researchers of the future. This “nurturing role” is evident in the RSBP repository which provides a basis for an annual competition called the International Genetically Engineered Machine (iGEM). This competition invites undergraduates from around the world to build and develop these systems from the standard biological parts as well as contribute to the growing database of parts. The official iGEM website explains the competition as:

“Multidisciplinary teams ... [work] to build genetically engineered systems using standard biological parts called Biobricks. iGEM teams work inside and outside the lab, creating sophisticated projects that strive to create a positive contribution to their communities and the world.”(Competition, n.d.).

Inherently in its criteria are expectations for using RSBP and RSBP-like data. The creation of standardized assemblies of biological parts called BioBricks encourages open data and open science methods. For this reason, iGEM serves as a case-study of open science in an emerging scientific field, and specifically how the competition interacts with the data practices and larger development of the Registry of Standard Biological Parts, and the acculturation of data sharing practices in synthetic biology. (Act on the comment mentioned here)

As previously mentioned, BioBricks are biological parts (or simply “parts”) standardized so that they are interchangeable; each part consist of DNA sequences encoding for a specified function in the cell which may be combined with other parts to create novel systems and applications (Baker et al., 2006). The development of the concept of BioBricks and their standardization for iGEM is gaining popularity and traction in synthetic biology as the discipline grows (Purnick & Weiss, 2009).

The BioBricks are also data of much contention, since the developed BioBricks have potential to alter sensitive biological systems and there is minimal international policy around the BioBricks, around which institutions and a developing knowledge infrastructure have formed (Endy, 2005).

The mandate of iGEM is “[dedication] to education and competition, advancement of synthetic biology, and the development of open community and collaboration.” (Competition, n.d.) . This attitude towards openness and collaboration is reinforced within the competition and the other facets of iGEM in a number of ways, including necessitating inter-university collaboration as well as a strict protocol and requirements of submission to the BioBricks registry. This submission to the registry allows future teams to use previous projects for their work at the competition in future use. This practice of planning and preparing a project that must follow certain criteria is logical for the project as well as teaching the students, as potential future researchers, and the expectations for the submission to the Registry. (Act on the comment mentioned here)

An important aspect of this development of open data is collaboration and communication. This is demonstrated in the iGEM gold medal criteria, which requires teams to collaborate. On the official iGEM website, it requires teams to follow specific collaboration protocol:

Help any registered iGEM teams from high school, different track, another university, or institution in a significant way by, for example, mentoring a new team…helping validate a software/hardware solution to a synbio problem (“Judging/Medals”, n.d.).

For collaboration to occur teams must share, release, and either use or reuse another team's data. Teams must also either “improve the function or characterization of a previously existing BioBrick part or Device" (“Judging/Medals”, n.d.). This increases the probability for data reuse in iGEM. Collaborations of this manner portray iGEM as a knowledge infrastructure that encourages the development of sustainable and open data practices in young researchers. While it may not extend beyond the completion of the competition, this cross-disciplinary and inter-university collaboration breaks down some of the inherent competitive aspects of iGEM.

Given the global and commercial nature of genetic disciplines, this collaboration also introduces the undergraduate teams and individuals to the challenges and conflicts that arise in global collaboration or even amongst labs that are not within the same institution (Kelwick et al., 2015; Kotlarsky & Oshri, 2005).

Moreover, teams can also either build own their own research or another team’s research from a previous year. It follows that the Registry of Standard Biological Parts has methods of standardized metadata, which allow data to be located effortlessly when needed. The ability to build on others’ previous works demonstrates iGEM's belief of advancing research and driving innovation. This allows teams to ask new questions based on previous works, which can be tested to attain answers. This should provide new insight within the field of synthetic biology as well as achieving one of the principal goals of open science. Additionally, to attain answers to these new questions, highly advanced technology including software and hardware will be needed. For instance, you will need a combination of modeling software, and complex biological technologies in order to create a functioning BioBrick. (Act on the comment mentioned here)

Registry of Standard Biological Parts: Standards for Reuse, Documentation and Metadata

Synthetic biology is a young interdisciplinary field — two qualities not conducive to the existence of quality universal data standards. Yet, the notion of standardizing parts is necessary to ensure the accumulation of work in the field. Both instruments and research methods are standardized in iGEM, resulting in an interesting interaction between crowdsourcing data and preserving data interoperability, where contributors who have varying degrees of expertise and exposure to the field must navigate, then conform to, the knowledge infrastructure’s established standards. For example, GenoCAD is “a web-based application to design synthetic constructs… [that] is built upon computational linguistic foundation” (Cai, Wilson, Peccoud, 2010) to synthesize the DNA sequences by assembling the necessary parts. This standardizes the instruments used to conduct research. Hence, iGEM encourages the use of advanced technology to conduct research. However, several problems arise, both in pursuing design specifications for parts (or bricks) submission and also promoting the reusability of such data. One difficulty is spawned from reconciling the nature of the field and standardizing it (Müller & Arndt, 2012). The registry dictates all parts must have a prefix and a suffix flanking the sequence being submitted, each containing two recognition sites compatible to be cut by a prescribed set of restriction enzymes (EcoRI, XbaI and SpeI, PstI for the first BioBrick Assembly Standard (BBa1.0), for example). Incidentally, several cloning strategies have been implemented so that parts adhere to this requirement. However, a single protocol which meets the specifications of the project, or alternatively, a method that is independent of sequence composition and size, that also produces successful results within a time constraint, is non-existent and leaves a challenge among participants in the field (Müller & Arndt, 2012). Despite the drawbacks, such obstacles in data conformity practices are not necessarily prohibitive and the registry’s policy does cultivate a common standard among synthetic biologists (Müller & Arndt, 2012). In fact, while most labs acknowledge the difficulties with cloning, current solutions are effective enough that impetus for something better is poorly motivated (Müller & Arndt, 2012).

Thus, the data that exist in iGEM are in a format that can be understood by other participants, functionally overcoming some of the inter-institutional barriers of different synthetic biology laboratories. iGEM’s employment of GenoCAD is indicative of its dedication to contribute to the Registry of Standard Biological Parts through adopting new technology. Nonetheless, data exchange between GenoCAD and the Registry of Standard Biological Parts has limitations since they are developed independently. As a result, they do not use the same system of categories to describe the BioBricks making “it difficult to map categories of one resource into categories used by another system” (Cai, Wilson, Peccoud, 2010). Cai, Wilson, and Peccoud, the authors of “GenoCAD for iGEM: a grammatical approach to the design of standard-compliant constructs” suggest that an ontology can be developed to regulate the vocabulary used to describe BioBricks. This would solve the problem of the distinct system of categories. Next, they encourage “standards to delimit on DNA sequences of different categories of parts”(Cai, Wilson, Peccoud, 2010) to allow users of GenoCAD to combine parts in any order they desire. iGEM can effectively find solutions to problems such as limitations regarding data exchange between two systems and follows through with its goal to further the Registry.

Once the sequence data is standardized, efforts to make it reusable become the next concern. The description of parts in the registry is notorious for inconsistency in quality. While the open sharing of BioBricks is prioritized by the Registry of Standard Biological Parts, enforcing “sufficient” documentation and metadata is poorly policed. Coupled with the inexperience of undergraduates with publishing work and time constraints imposed by a yearly competition cycle, the description pages for individual “bricks” are typically variable in quality. (See Appendix A) with no prerequisites other than sequence and annotations. This discourages use of the repository by synthetic biology professionals. Initiatives to combat this gap include a proposal to embed prototypical fact sheets into the practice of documenting the sequences, which would include key measurements, experiment setup (host strain, media, temperature), and characterization parameters (Müller & Arndt, 2012). This is also partly addressed by an existing incentive structure embedded in iGEM gold medal criteria where teams are encouraged to:

Improve the function OR characterization of a previously existing BioBrick Part or Device (created by another team, or by your own team in in a previous year of iGEM), and enter this information in the part's page on the Registry. (Judging/Medals, n.d.)

Therefore, metadata of the DNA sequences describing how the part behaves and the experiments performed to create models of and validate this behaviour could be argued to be of less priority, though have been flagged as an area of improvement in the knowledge infrastructure. Synthetic biology follows the evolving norm of openness but cannot escape from also requiring some level of uniformity and quality.

External Influences: Property Rights in Synthetic Biology

Writing peer-reviewed publications at the end of the competition is impeded, given resource and time constraints, as teams start preparing for next year’s competition. The participating teams do submit their research and data to the Registry of Standard Biological Parts which belongs solely to iGEM. There are also a handful of startups that have come out of iGEM (“iGEM Startups”, n.d.).

Treating biological parts as data is inherent in the field, with established agreements on their ownership, control, and release. However, this is not synonymous to saying current practices for optimizing openness are without consequences in productivity. Policies for openness by the Registry of Standard Biological Parts are evaluated for the opportunities and constraints they infuse into the data practices of the field. The external influence of patent claims within synthetic biology will be shown to have effects on what data can be transferred across research environments and on the choices which scholars in synthetic biology make about what projects to pursue.

Given that the contribution and further improvement of parts are embedded in the competition, the Registry’s affiliation with iGEM has been mutually beneficial. While the community resource can depend on continued use and subsequent growth and improvement, iGEM participants simultaneously have reliable access to parts and information in the repository so that they may engineer and design biological systems based on BioBricks (Baldwin et al., 2012). Incidentally, use of the Registry of Standard Biological Parts requires engagement with the “BioBricks Public Agreement” to facilitate such openness (“The BioBrick User Agreement”, 2015). It entails a legal strategy aiming to reconcile open access to data through free use with the patentability of a product emerging from the combinations of parts (Baldwin et al., 2012). As of 2012, the success of such a structure “remains to be seen” and its efficacy is under dispute (Baldwin et al., 2012). Notably, several iGEM teams have engaged with this agreement: for example, Oxford iGEM proposed a reach-through license agreement as a better alternative. The University of Ottawa iGEM team documented their struggle with licensing in the submission of a part that originated from an external supplier (University of Oxford iGEM Team, 2014; Kidisyuk, 2015). While Borgman’s (2015) assertion of openness facilitating data creation is thus demonstrated within the iGEM competition, limitations on sharing are also evidence when economic motivations for creating or withholding data exist.

Claiming property rights over synthetic biology has raised concerns for the developing field. A “very real” prospect exists that patents with broad claims, applicable to technologies that are ‘’foundational” or “essential” to the science, may place profound barriers on early-stage researchers, requiring use of patented information at prohibitive cost which then impedes innovation (Baldwin et al., 2012). Narrow patenting in synthetic biology, similarly, could also establish financial obstacles to invention when they cluster into ‘patent thickets’ or ‘anti-commons’ (Baldwin et al., 2012). As parties pursue financial returns for their work through patenting at the end of research, or alternatively as they navigate what is available to use within budget at the beginning of research, the application of property rights policies to data in synthetic biology are shown to be pulled by competing interests. Further creation of data is dependent on striking a balance between being open, or in other words, maintaining a culture of data reuse and accessibility, and the sustainability of such activity (Borgman, 2015).

Finally, no form of validating research data seems to exist in iGEM. For example, after the University of Waterloo was awarded for their wiki website, they found errors. The team “tried to fix many of them out of good conscience”(“An iGEM Critique”, n.d.) but eventually failed to fix all of the errors, and focused their time on preparing for the next iGEM competition which was just a couple of months away. Furthermore, iGEM does not prioritize validating data with it being highly unlikely that “teams or judges are going to check the validity of…data thoroughly enough during or after the Jamboree to notice mistakes, let alone correct them” (“An iGEM Critique”, n.d.).

A solution to the problem of validity may involve having teams validate each other’s data by either reproducing the entire research or just the data collected or the observations. This can easily be done by embedding this in the criteria for the gold medal standard. If reproducibility is successful then the team's research is valid and if not, it is open to error, misconduct or fraud.

iGEM is a rich case-study in the aspirations for open-science, competing with real-world challenges of research competition, evolving technologies, and reward structures which prize novelty over sustained efforts at validity and documentation.

Conclusion

iGEM and the Registry of Standard Biological Parts are expanding international knowledge infrastructures that are dedicated to advancing the new discipline of synthetic biology. The approaches are imperfect but, from the perspective of synthetic biology, these are also emergent scientific enterprises that are grappling with the challenges of open science, and the scientific ideals of open data. The iGEM competition is valuable as an educational opportunity for young researchers, but through its affiliation with the Registry, also acts as a catalyst for openness. The competition helps undergraduates, future scholars, develop skills in communication, collaboration, data management and use. Through these facets, transparency and cooperation are encouraged in research. While not executed perfectly, they are at the core of the competition and repository, developing synthetic biology as a discipline.

References
Here