Child pages
  • Fetching Volume OCR Content in HTRC Data Capsule (Secure Mode)
Skip to end of metadata
Go to start of metadata

In the virtual machine you created in HTRC Data Capsule, in maintenance mode, run the command: 

pip install htrc

Then, switch to secure mode, to fetch content by using Data API in Data Capsule (it won't work in maintenance mode to prevent data leak)

Once in secure mode, run the following command to download OCR data: 

htrc download -o output htrc-id
  • the htrc-id is a file containing a volume id list that you're interested in, with one ID per file.
  • output is the folder for the fetched OCR content

Note: to customize your htrc-id list, you will need to search in HathiTrust or using other metadata sources, including HathiFiles

  • No labels