A: The HTRC is the research arm of the HathiTrust. It facilitates scholarly research using the large-scale HathiTrust Digital Library by providing mechanisms for researchers to access content in the HathiTrust and study it using computational tools for text analysis.
The HTRC is a partnership between Indiana University (IU) Libraries, the Pervasive Technology Institute, and the School of Informatics and Computing at IU, as well as the University of Illinois at Urbana-Champaign (UIUC) Libraries and the Graduate School of Library and Information Science at UIUC.
A: The HTRC has created a suite of tools that allow researchers to perform text analysis on content in the HathiTrust Digital Library. These tools include the Portal and Workset Builder, HathiTrust+Bookworm, and the HTRC Data Capsule. They are intended to meet the needs of various HTRC researchers.
A: You use the HTRC by interacting with our tools and services. Please refer to the documentation for each tool or service for more specific how-to guides.
A: Most of the HTRC services require an account to log in and interact with the tools, though HathiTrust+Bookworm is available without an account.
Register for an account by going to the main page of the Portal and choosing "Sign up" from the menu. Anyone possessing an email address from an institution of higher education is allowed to register, including those whose institutions are not HathiTrust members.
A: Using the search bar on the hathitrust.org site allows you to find digitized items in the HathiTrust Digital Library and to read them if they are in the public domain. With the HTRC tools you can instead work with material in the HathiTrust Digital Library at scale, using computational methods to analyze subcollections of content relevant to your research.
A: HTRC currently provides access to the public domain corpus OCR text from the HathiTrust, as well as each volume’s MARC bibliographic and METS metadata.
The HTRC makes available also two datasets, the HTRC Extracted Features Dataset and a dataset of Word Frequencies in English Language Literature, 1700-1922.
A: This table outlines the differences between the HTRC Data API and HathiTrust Data API. The HTRC Data API currently functions within the HTRC Data Capsule.
|HTRC Data API||HT Data API|
|purpose||to serve high-performance large-scale algorithms and programs||to provide public users some volume retrieval capabilities|
|bulk retrieval of volumes||yes||no|
|metadata available||METS||METS, MARC|
A: The HTRC Sandbox is a resource for testing and exploration made available to our user community interested in trying things out on a smaller scale. It provides access to approximately 250,000 volumes from the non-Google scanned public domain subset of material in the HathiTrust Digital Library. The following table lists the endpoints for the various HTRC Sandbox services.
|Portal||https://sandbox.htrc.illinois.edu/HTRC-UI-Portal2||The portal allows you to browse volume lists and algorithms, execute algorithms, and view results|
|Workset Builder||https://sandbox.htrc.illinois.edu/blacklight||The Blacklight search interface allows you to search for volumes, and create volume lists that can be used by algorithms. It provides a GUI interface to our Solr index|
|Data API||https://sandbox.htrc.illinois.edu/data-api||The HTRC Data API provides access to the corpus data and METS XML via a RESTful web service|
|Solr Proxy||http://sandbox.htrc.illinois.edu/solr||The HTRC Solr Proxy provides access to the Solr index. A sample query is: http://sandbox.htrc.illinois.edu/solr/ocr/select?q=shakespeare please refer to the Solr Guide for more details on query.|
A: Please join the HTRC User Group mailing list.
A: We welcome your feedback! Under the “Help” tab on the menu bar in the Portal, you’ll find an option to contact us. Please use it to send your thoughts or questions.
You can also report a bug on our JIRA instance. You will need to create an account to log into JIRA if you have not done so already.
A: If you have not found what you are looking for in our documentation, you might find the material posted to our Publications and Presentations page useful for further reading.
You might also consider attending a workshop. You can find information on future workshops on our calendar.
Or you can ask for further assistance on our mailing lists. See below for more information about signing up.