Explainer: what is bioinformatics?
By Mark Ragan, University of Queensland
Bioinformatics underpins and enables research across the life sciences.
This ranges from high-volume reductionist science (genomics, proteomics and the other “omics”, regulation of gene activity, epigenetics, protein and RNA structure and function, cell organisation) to comparative, evolutionary and systems biology. The latter, in particular, attempts to discover how our bodies work.
Our hunger for bioscience data
Humans have long sought to understand how our bodies work. The emergence of experimentalism in the mid-17th century brought a philosophical approach, later termed reductionism, which focused on the component parts of our bodies – organs, muscles, bones and the like.
Analogies were developed between the human body and the most complex artefacts of human ingenuity: timepieces (Joseph Glanvill), mechanical toys (René Descartes), pneumatic machines (Julien Offray de la Mettrie) and – with the rise of the Industrial Revolution – mills, factories and assembly lines.
Thus should we encounter, for the very first time, a pocket watch, we would likely take it apart and investigate its parts – not as instinct, but following accepted experimental philosophy. If pressed, we would defend our approach as “scientific method”.
In many respects, reductionism has been spectacularly successful: the DNA double helix, the genetic code, and the central dogma of molecular biology (“DNA makes RNA makes protein”) are headline examples.
By the mid-20th century a more-integrative biology, based on models of information storage and flow (Conrad Waddington, Erwin Schrödinger, Stuart Kauffman), seemed within reach, even without a full human-body parts list (every gene, every protein).
Computers were becoming more powerful and accessible beyond departments of electrical engineering. Against this background Paulien Hogeweg coined the terms “bioinformatica” (1970 in Dutch) and “bioinformatics” (1978 in English) to describe a proposed research field in which “information processing could serve as a useful metaphor for understanding living systems”.
But reductionism had not run its course. As a succession of ever-more-powerful biomolecular technologies followed, biologists of all sorts – agricultural scientists, medical researchers, ecologists, biotechnologists – really did want the full parts list – every gene, every protein, every control signal – and not only for humans. And they would have them.
In 1977, Fred Sanger and colleagues published the 5,386-nucleotide sequence of bacteriophage ΦX174. Publicly accessible DNA sequencing has since grown 100 million-fold, while new technologies (microarrays, proteomics, bio-imaging) are spewing forth data in similarly immense quantities.
Each of the two main international centres for bioscience data – the US National Centre for Biotechnology Information, and the EMBL European Bioinformatics Institute – now manages almost 10 petabytes of data (10 million gigabytes); and as we step over the threshold into the era of personal genomics, there is every reason to expect this rate of growth not only to continue, but to accelerate.
Managing bioscience data
No data – certainly not of this size and complexity – manage themselves. Beginning in the 1970s, methods were appropriated from information technology to capture, manage, index and share these data; and from computer science to design scalable algorithms to extract information from them.
The word bioinformatics was re-purposed to describe this multidisciplinary interface. Specialist practitioners describe themselves as bioinformaticians, while familiarity with the basic bioinformatics toolkit (e.g. retrieving sequences from databases, assembling, aligning and annotating them) is increasingly widespread.
Other disciplines too have sprung up at this interface: biomathematics (emphasising statistical modelling), biostatistics, systems biology (emphasising networks) and synthetic biology (engineering new functions or organisms).
Bioinformatics (the development of methods and software) is sometimes distinguished from computational biology (their application to theoretical and applied questions in biology). From 1998 the journal Computer Applications in the Biosciences was renamed Bioinformatics.
Bioinformatics was instrumental to the Human Genome Project (and all others before and since), and is indispensable for interpreting almost any data dealing with DNA, RNA or protein sequence, structure, interactions or function.
Bioinformatics into the future
Much of the success and dynamism of today’s life science flows directly from the culture that has grown up around bioinformatics, including open-access data and data services, community projects (e.g. in annotation, ontology or standards development) and open-source software.
It was not inevitably so: intellectual property, patents and companies play no less a role in life science than in, say, consumer electronics industries. The biotechnology industry is diverse, vibrant, and reciprocally supportive of open standards, software and (where possible) data.
A few years ago I was predicting that bioinformatics, like molecular biology before it, would sink without a trace into the everyday practice of bioscience – but bioinformatics has retained its distinct identity, including priorities that centre on the training of personnel.
The reason, I suspect, is twofold. Molecular biology was born at the interface with chemistry and physics – hard sciences – whereas bioinformatics draws on computer science and information technology, i.e. engineering.
A graduate biologist can reach back to her first-year science to understand how the DNA double helix was inferred from an X-ray diffraction pattern, but would find no similar fundamentals for dynamic programming, suffix trees or hidden Markov models.
And unlike molecular biology, bioinformatics continues to push hard at the boundaries of every one of its constituent fields.
Click here to see more Explainer articles on The Conversation.
Mark Ragan is Professor and Head of Genomics and Computational Biology at the Institute for Molecular Bioscience, and Adjunct Professor in the School of Information Technology and Electrical Engineering, at the University of Queensland; and Director of the ARC Centre of Excellence in Bioinformatics. His research group is funded by ARC (Discovery), NHMRC (Projects), Queensland Government (Co-investment Fund), the J.S. McDonnell Foundation and the University of Queensland (UQ), with in-kind support of Bioplatforms Australia, the Sugar Research and Development Corporation, and BSES Limited. His group has been awarded access via competitive merit allocation to facilities of the National Computational Infrastructure (NCI). Mark leads national infrastructure projects funded by ARC (LIEF), NCI, Bioplatforms Australia, CSIRO, Queensland Cyber Infrastructure Foundation and UQ. He co-founded QFAB (Queensland Facility for Advanced Bioinformatics), a partnership among UQ, Griffith University, Queensland University of Technology and Queensland Government (DEEDI). He has no other relevant affiliations, funding sources, or financial interests.