Child pages
  • GlobalNames and the HathiTrust

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


Our team of researchers, the Species File Group, develop and use digital tools for biodiversity informaticians, those scientists who study the Earth's species. One of the things we focus on is locating information about the Earth's species via their scientific names, a project called GlobalNames. The idea is straightforward, find a biological name like _Homo sapiens_ (humans), _Apis mellifera _ (the Western honey bee), or _Anopheles gambiae_ (a mosquito that transmits Malaria), and you may discover information important to scientists "nearby".  In the context of the GlobalNames project finding a name means parsing digitized literature or datasets, small or large. Thanks to funding from the National Science Foundation (NSF ABI 1645959, 2015) initial tools developed by Dmitry Mozzherin and Alex Myltsev were developed and hardened against the large, free corpus of scientific publications in the Biodiversity Heritage Library (BHL). Within the BHL the diversity of data (e.g. different languages, publication types, general quality of parsed text), and its structure therein let us find and resolve many edge cases in the name detecting algorithms.  While finding specially formatted latinized names is challenging, the results of this work are fairly simple: at their core, they are an index indicating that “_this_ name was found _there_”. From these simple data many downstream features and explorations emerge, for example the list of names found on any given page of the BHL (e.g. Scientific Names on this Page), is derived from our tools.