Child pages
  • HTRC data access

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


  • HTRC algorithms and HTRC Data Capsules have the capability are capable of analyzing the entire HathiTrust corpus, and additionally make use of each volume’s MARC bibliographic and METS metadata. Both the HTRC algorithms and Capsule-environments draw from the HTRC Data API described below.

  • The HTRC makes available also two datasets, the HTRC Extracted Features Dataset and a dataset of Word Frequencies in English Language Literature, 1700-1922. HTRC Extracted Features includes metadata and extracted page-level data (words and word counts) for 1315.7 million volumes.

  • HathiTrust+Bookworm visualizes data for 15.7 million volumes.