Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


 This project will create an index of all the scientific names of the Earth’s species found within the HathiTrust corpus. The index, which will likely measure in the hundreds of millions to billions of entries, will consist of a simple link between the scientific name and the volume and page location of that name within HathiTrust. The index will assist in identifying volumes that may be medically relevant, for example by identifying all of the volumes containing the scientific name for the mosquito that carries illnesses such as Zika virus (‘​Aedes aegypti’). The index will also allow volumes to be grouped into clusters based on which scientific names they contain to show which taxon (e.g. “mammals”) are most common. This team of researchers has completed similar work across the data of the Biodiversity Heritage Library. Their ACS project will allow them to do cross-corpora comparisons. 

Project report: Global Names and the HathiTrust: Towards comprehensive indexing of taxon names in real time

Supporting The Conglomerate Era Project