The HathiTrust Research Center reached a development benchmark in its release of production infrastructure v1.0 to support data mining and textual analysis of volumes in HathiTrust.
The infrastructure includes an entrance portal, search and collection-building tools (using Blacklight), and access to SEASR analysis algorithms that can be run against the HathiTrust public domain corpus (more than 3 million volumes). In addition to the production services, the HTRC offers a development “Sandbox”. The sandbox runs against non-Google scanned content (about 260,000 volumes) and provides a test-bed for interested researchers to experiment with writing their own algorithms for use in the HTRC infrastructure.
The production release concludes the first six month period in Phase 2 of development of the HTRC (Oct 2012-March 2014). Phase 2 will also include the development of the HTRC-Sloan-Cloud – infrastructure that will include additional mechanisms to allow secure, non-consumptive access to the entire HathiTrust corpus – and systems to accommodate the full 10.6 million HathiTrust volumes in the HTRC. For more information on HTRC services and testing of the production infrastructure, please join our HTRC-usergroup-l listserv at https://list.indiana.edu/sympa/subscribe/htrc-usergroup-l. A Getting Started FAQ is available at http://bit.ly/XkZKev
HTRC is planning a release of its software infrastructure on March 31, 2013. This release is of two separate cyberinfrastructure stacks both loaded with the latest and greatest service components and tools: the “HTRC sandbox” is an open testbed for community experimentation. It is loaded with non-Google scanned public domain volumes. The “HTRC production stack” hosts the full public domain corpus. Both Sandbox and Production Stack offer the suite of services and tools that debuted during the September 2012 UnCamp but have since had bug fixes and many other improvements. In looking ahead, the June 2013 release of the HTRC production stack will include the HTRC-Sloan-Cloud in early support of Non-consumptive Research.
If you're a technical user or developer of HTRC or interested in becoming one, please join htrc-usergroup-l@list.indiana.edu - an open forum for raising problems and questions as you develop.