In the virtual machine you created in HTRC Data Capsule, in maintenance mode, download this file DataAPI_SampleCode.zip, and place it somewhere in the capsule's file system.
Then, switch to secure mode, to fetch content by using Data API in Data Capsule (it won't work in maintenance mode to prevent data leak)
Unzip the folder, run this command to fetch some books
- the htrc-id is the volume id lists that you're interest in.
- output.zip is the .zipped folder for the fetched OCR content
In DownloadVolumes.py script, go to the "Data API volume request parameters" section, and change the parameters there. You can use 'concat':'true' if you want to concatenate pages of a volume, and use 'mets':'true' if you want to return METS file together with volume content. Below are several examples.
e.g. to concatenate book pages for a book, uncomment this line
to get METS file, uncomment this line (note: an METS file is a metadata file that comes with each volume, and it records MARC and archive information of the book)
If you only need to have volume content returned, without concatenating the pages, then use this line
If you want to return mets record along with the volume content and concatenate all the pages into one single text file per volume, then use this line