- Automated species identification
The automated identification of biological objects such as
insects (individuals) and/or groups (e.g.,species , guilds, characters) has been a dream among systematists for centuries. The goal of some of the first multivariate biometric methods was to address the perennial problem of group discrimination and inter-group characterization. Despite much preliminary work in the 1950s and '60s, progress in designing andimplementing practical systems for fully automated object biological identification has proven frustratingly slow. As recently as 2004 Dan Janzen updated the dream for a new audience:"The spaceship lands. He steps out. He points it around. It says ‘friendly–unfriendly—edible–poisonous—safe– dangerous—living–inanimate’. On the next sweep it says ‘Quercus oleoides—Homo sapiens—Spondias mombin—Solanum nigrum—Crotalus durissus—Morpho peleides— serpentine’. This has been in my head since reading science fiction in ninth grade half a century ago."
The species identification problem
Janzen’s preferred solution to this classic problem involved building machines to identify species from their
DNA . His predicted budget and proposed research team is “US$1 million and five bright people.” However, recent developments in computer architectures, as well as innovations in software design, have placed the tools needed to realize Janzen’s vision in the hands of the systematics community not in several years hence, but now; and not just forDNA barcode s,but for digital images of organisms too. A recent survey of results accuracy results for small-scale trials (<50 taxa) obtained by such systems (Gaston and O’Neill 2004) shows an average reproducible accuracy of over 85 percent with no significant correlation between accuracy and the number of included taxa or the type of group being assessed (e.g.,butterflies ,moths ,bees ,pollen ,spore s,foraminifera ,dinoflagellates ,vertebrates ). Moreover, these identifications—often involving thousands of individual specimens—can be made in a fraction of the time required by human experts and can be done on site, on demand, anywhere in the world.These developments could not have come at a better time. As the taxonomic community already knows, the world is running out of specialists who can identify the very
biodiversity whose preservation has become a global concern. In commenting on this problem in palaeontology as long ago as 1993, Roger Kaesler recognized:"“… we are running out of systematic paleontologists who have anything approaching synoptic knowledge of a major group of organisms ... Paleontologists of the next century are unlikely to have the luxury of dealing at length with taxonomic problems … [Paleontology] will have to sustain its level of excitement without the aid of systematists, who have contributed so much to its success.”".
This expertise deficiency cuts as deeply into those commercial industries that rely on accurate identifications (e.g.,
agriculture ,biostratigraphy ) as it does into a wide range of pure and applied research programmes (e.g., conservation, biologicaloceanography ,climatology ,ecology ). It is also commonly, though informally, acknowledged that the technical, taxonomic literature of all organismal groups is littered with examples of inconsistent and incorrect identifications. This is due to a variety of factors, including taxonomists being insufficiently trained and skilled in making identifications (e.g., using different rules-of-thumb in recognizing the boundaries between similar groups), insufficiently detailed original group descriptions and/or illustrations, inadequate access to current monographs and well-curated collections and, of course, taxonomists having different opinions regarding group concepts. Peer review only weeds out the most obvious errors of commission or omission in this area, and then only when an author provides adequate representations (e.g., illustrations, recordings, gene sequences) of the specimens in question.Systematics too has much to gain, both practically and theoretically, from the further development and use of automated identification systems. It is now widely recognized that the days of systematics as a field populated by mildly eccentric individuals pursuing knowledge in splendid isolation from funding priorities and economic imperatives are rapidly drawing to a close. In order to attract both personnel and resources, systematics must transform itself into a “large, coordinated, international scientific enterprise” (Wheeler, 2003). Many have identified use of theInternet —especially via theWorld Wide Web — as the medium through which this transformation can be made. While establishment of a virtual,GenBank -like system for accessing morphological data, audio clips, video files and so forth would be a significant step in the right direction, improved access to observational information and/or text-based descriptions alone will not address either the taxonomic impediment or low identification reproducibility issuessuccessfully. Instead, the inevitable subjectivity associated with making critical decisions on the basis of qualitative criteria must be reduced or, at the very least, embedded within a more formally analytic context.Properly designed, flexible, and robust, automated identification systems, organized around distributed computing architectures and referenced to authoritatively identified collections of training set data (e.g., images,
gene sequence s) can, in principal, provide all systematists with access to the electronic data archives and the necessary analytic tools to handle routine identifications of common taxa. Properly designed systems can also recognize when their algorithms cannot make a reliable identification and refer that image to a specialist (whose address can be accessed from another database). Such systems can also include elements of artificial intelligence and so improve their performance the more they are used. Most tantalizingly, once morphological (or molecular) models of a species have been developed and demonstrated to be accurate, these models can be queried to determine which aspects of the observed patterns of variation and variation limits are being used to achieve the identification, thus opening the way for the discovery of new and (potentially) more reliable taxonomic characters.References cited
* Gaston, K. J., and M. A. O'Neill. 2004. "Automated species identification—why not?" Philosophical Transactions of the Royal Society of London, Series B 359:655–667 (see [http://www.journals.royalsoc.ac.uk/(kqmnfs2d54r4owreisnhkg45)/app/home/contribution.asp?referrer=parent&backto=issue,8,19;journal,37,229;linkingpublicationresults,1:102022,1] ).
* Janzen, D. H. 2004. "Now is the time". Philosophical Transactions of the Royal Society of London, Series B 359:731–732 (see [http://www.ucalgary.ca/~dsikes/zool575/readings/Janzen%20(2004).pdf] ).
* Kaesler, R. L. 1993. "A window of opportunity: peering into a new century of paleontology. Journal of Paleontology" 67:329–333.
* Wheeler, Q. D. 2003. "Transforming taxonomy". The Systematist No. 22:3–5External links
Here are some links to the home pages of three mature species identification systems. Whileall were initially designed to identify specious invertebrate groups, the SPIDA and DAISY system are essentially generic and capable of classifying any image material presented. The ABIS system is restricted to insects with membranous wings as it operates by matching a specific set of characters based on wing venation.
* [http://research.amnh.org/invertzoo/spida/common/index.htm The SPIDA system]
* [http://www.informatik.uni-bonn.de/projects/ABIS ABIS]
* [http://www.tumblingdice.co.uk/daisy DAISY]
Wikimedia Foundation. 2010.