Software Development

From HAO Wiki

Jump to: navigation, search

The core of our project is highly dependent on community interaction and as such our focus is on the development of an infrastructure that facilitates interaction, tests and refines integrity of the core dataset. We are well positioned to meet our goals thanks to mx, the open source core application that will act as a foundation for much of our work and includes data structures for specimens (mapping to Darwin Core standards), references, and other foundational data that are important components to the ultimate goals of the HO. Our core applications will be built on Ruby on Rails and/or PHP and use javascript for AJAX based functionality. Rails applications use a database agnostic framework, i.e. any number of freely available software systems including MySQL, Postgres, and Oracle will be usable. This aspect provides extended utility to other parties wishing to adopt our software. Applications will be platform independent, and those that are user driven (i.e. not script based in nature) will be completely web based.

While our data is unique to this project all other software and infrastructure will be made accessible to the broader community—i.e., almost without exception the software we develop will be project agnostic. Every aspect of our project will be premised on providing community access to the source of our utilities, the data we collect, and the methods that we derive. All software will be open source under share-alike attribution licenses as defined by Creative Commons. We have already adopted Sourceforge as a repository for mx, and have maintained the application there for two years. Additional applications will be deposited at Sourceforge, GitHub, or Google Code, well-established repositories with strong dedication to supporting open source communities. Projects deposited within Sourceforge are never destroyed, so our core applications survive perpetually in free storage.

Code management will be through SVN or Git repositories, which track every change ever made to the source and allows for an extremely efficient development environment among multiple developers. We will use Sourceforge and/or TRAC to manage bug tracking and feature requests. The HAO wiki will be used for documentation and will be supplemented by API documentation automatically generated from the Rails framework based on source code parsing and code commenting. We have adopted a “release early, release often” mindset by providing bi-weekly “nightly” read-only releases to those who want to see the latest developments in our software. Software releases will occur roughly every 3 to 6 months depending on the relative size of the given project (specifics in Table 3). We hope that the constant updating of our source will encourage other parties to track our development, and provide feedback.

Our project will make use of various widely adopted standards. For our core deliverable, the HAO, we will be targeting the OBO version 1.2 draft. Our preliminary data management application (mx, as of version 0.2.606) already exports a minimally functional OBO legal file, however, much development is required to fully comply with this standard. All web services will be RESTful or XML-based APIs. We will use OPENids for all publicly creatable accounts. Multiple language support will be handled by using unicode/UTF-8 character encoding.

Because the core lexicons contain relatively few data points in terms of overall storage capacity (i.e. thousands of words), we do not foresee issues regarding storage capacity. All images and image related metadata are stored at Morphbank, therefore we have no overhead with respect to the large quantity of data generated in this capacity. Optical character recognition (OCR) files will be stored as simple text, with metadata pertaining to them stored in the mx application. All PDFs and text files containing OCRed text from copyrighted material will be accessible only to the HO team. Our project is not aimed at archiving these documents, but we will maintain and deposit them with the HO DB when applicable. Metadata for these documents will be archived within the OBO file itself as dbxrefs. We will run cron jobs to create nightly off-site backups of the HAO DB.

Personal tools
HAO Wiki