Child pages
  • Fetching Volume OCR Content in HTRC Data Capsule (Secure Mode)
Skip to end of metadata
Go to start of metadata

In the virtual machine you created in HTRC Data Capsule, in maintenance mode, run the command: 

pip install htrc

Then, switch to secure mode, to fetch content by using Data API in Data Capsule (it won't work in maintenance mode to prevent data leak)

Once in secure mode, run the following command to download OCR data: 

htrc download -o output.zip htrc-id
  • the htrc-id is a file containing a volume id list that you're interested in, with one ID per file.
  • output.zip is the .zipped folder for the fetched OCR content

Note: to customize your htrc-id list, you will need to search in HathiTrust or using other metadata sources, including HathiFiles

  • No labels