Ontologies in Evolutionary Biology
From HAO Wiki
Ontologies represent formalized domains of knowledge, whereby classes (or terms) are related to one another to enable logical reasoning. For example, we have three classes: A, B, and green, and these classes are related as: B part_of A and A has_color green. Given this ontology we can reason that B must also be green since it is part of A (which has the color green). We are able to make inferences on the color of B—in the absence of explicit statements about the color of B—because we have an ontology that indirectly includes this information. That is, an ontology serves as a formal model through which one can employ mathematical logic to clarify and define concepts and relationships within a domain of interest (e.g., biology).
Our recognition of ontologies as formalized domains of knowledge and as logical inference tools stretches back millennia (at least to Aristotle), and the employment of controlled vocabularies for complex representations of information in computer science is well established (Bard 2008). Only recently has the power of ontologies for the consistent representation and subsequent mining of biological data been recognized, however, with the model organism and genomics communities pioneering the use of controlled vocabularies for gene expression annotations (e.g., the Gene Ontology). Successes in this area of informatics, especially the development of user-friendly tools and emerging standards for these resources, resulted in a recent, rapid burst of ontology use in evolutionary biology (e.g., Mabee et al. 2007a, b, Ramírez et al. 2008). Like phylogenies (the tree-like patterns of historical relationships between organisms), ontologies link, and therefore provide critical contexts for biological concepts; in this way ontologies provide another important mechanism from which inferences that inform our collective knowledge of evolutionary biology are generated.
The meteoric rise in conversation concerning ontologies (witness the recent workshops sponsored by the ZFIN/NESCent collaboration) can also be attributed to continued development of the Semantic Web. By formalizing relationships between otherwise disparate concepts, knowledge can be more readily harvested by computers and people. An increasing amount of biological information, often from vastly different domains (e.g., genomics, ecology, fishes, plants) is being ported to the Web (e.g., the Encyclopedia of Life), and the continued refinement of expertly constructed ontologies will help avail these data to a broad audience.
Enhancing biological research
The Biology domain has always benefited from ontological entities. The Linnaean classification is essentially an ontology of taxon names connected by is_a relationships (Homo is_a Homonidae), and even the pre-Linnaean taxonomies were hierarchical. The onset of large-scale genomics projects, which collected thousands of instances of gene sequences and gene expressions, set off a colossal effort to standardize and classify the way those instances are described. The resulting ontology (the Gene Ontology (GO); http://www.geneontology.org/) spans all of life (as we all operate by products coded from nucleic acids) and enables gene expression data from the nematode model, C. elegans, to be compared with gene expression data from flies and plants.
Since the establishment of GO the Biology domain has witnessed the emergence of numerous relevant ontologies that define and relate classes of anatomy (e.g., Amphibian Gross Anatomy, 958 terms; C. elegans Gross Anatomy, 6,725 terms; Drosophila Gross Anatomy, 6,321 terms; Mosquito Gross Anatomy, 1,861 terms; Spider Comparative Biology, 450 terms), behavior (Loggerhead Nesting Behavior, 246 terms), phenotypic qualities (PATO, 1,976 terms), units of measurement (267 terms), and spatial entities (Spatial Ontology, 106 terms). These controlled vocabularies, especially when used in concert, can be exploited to make semantic statements about biology (in this case a fly):
Original statement: Length of the ventral margin of the head = 0.1 mm. Expressed as classes: (PATO:0001708) (BSPO:0000684) (FBbt:00000004) (UO:0000016) From these ontologies: (phenotype quality) (spatial) (fly anatomy) (units of measurement)
This example can be understood by both humans and machines that mine bodies of such statements (e.g., databases of species descriptions). As more subdomains of Biology employ ontologies, an increased amount of critical knowledge can be recovered from existing and future blocks of data.
The goal of this project is to refine and avail the Hymenoptera Ontology (HO) of anatomy in order to 1) facilitate the extraction of vast information from legacy descriptive taxonomy literature (which covers >115,000 species at present); 2) provide Web-based mechanisms which maximize the utility of the HO; 3) provide a foundation from which new mechanisms of (re)describing and identifying hymenopteran species can evolve; and 4) enable the informed annotation and efficient querying of genes/gene expression and genomic databases.
Two major challenges must be surmounted to realize our goals: 1) taming the large and diverse lexicon used to treat Hymenoptera anatomy and 2) adopting and employing the rapidly expanding suite of tools and standards used to promote semantic biology. We are extremely well positioned to unify these fields thanks, in part, to the strong support of a diverse community of experts. Our effort to date illustrates that though we have put together a sizable effort both in terms of data and applications, we require further input to catalyze a revolution in Hymenoptera-related science.

