The HathiTrust Research Center reached a development benchmark in its release of production infrastructure v1.0 to support data mining and textual analysis of volumes in HathiTrust.
The infrastructure includes an entrance portal, search and collection-building tools (using Blacklight), and access to SEASR analysis algorithms that can be run against the HathiTrust public domain corpus (more than 3 million volumes). In addition to the production services, the HTRC offers a development “Sandbox”. The sandbox runs against non-Google scanned content (about 260,000 volumes) and provides a test-bed for interested researchers to experiment with writing their own algorithms for use in the HTRC infrastructure.
The production release concludes the first six month period in Phase 2 of development of the HTRC (Oct 2012-March 2014). Phase 2 will also include the development of the HTRC-Sloan-Cloud – infrastructure that will include additional mechanisms to allow secure, non-consumptive access to the entire HathiTrust corpus – and systems to accommodate the full 10.6 million HathiTrust volumes in the HTRC. For more information on HTRC services and testing of the production infrastructure, please join our HTRC-usergroup-l listserv at https://list.indiana.edu/sympa/subscribe/htrc-usergroup-l. A Getting Started FAQ is available at http://bit.ly/XkZKev