Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


This document is a starting point for users a user interested in downloading extracted feature the json-format Extracted Features (EF) data files corresponding to the specific HathiTrust Digital Library volumes that constitute their the user's custom workset (that they have the user has built with the HTRC Workset Builder).  

that with our non consumptive virtual machine (VM). A non consumptive VM has two modes, i.e., maintenance mode and secure mode. In maintenance mode, user is allowed to access network freely except for HTRC corpus repository and install whatever software she wants. In secure mode, network access is restricted. Users are only allowed to access a few network addresses e.g., HTRC corpus repository. In addition, any changes user makes to the OS in secure mode will not be persisted. To save data, user needs to write to a specified storage called secure volume. The secure volume is invisible in maintenance mode.

We provide a user friendly web interface that users can interact with to manipulate VM running on our back end infrastructure. User can have following VM operations:

  • Create a VM. A virtual machine is created but its power is off.

  • Start a VM. The virtual machine starts up to maintenance mode.

  • Stop a VM. The virtual machine shutdowns.

  • Delete a VM. The virtual machine is deleted. Everything relative to this virtual machine is gone.

  • Switch a VM from maintenance mode to secure mode or vice versa.

  • Query VM status.

Once a VM is started, users can log into VM through a VNC client. To run analysis against HTRC OCR repository, users need to switch the VM to secure mode. Below is a typical workflow a new user may follow.

  1. Create a new VM;

  2. Start the new VM;

  3. Log into the VM;

  4. Configure the software environment as needed. Upload the analysis program to the VM;

  5. Switch VM to secure mode through web interface;

  6. Run the analysis program against HTRC corpus repository;

  7. Switch VM back to maintenance mode to regain normal network access.


We will show, step by step, how you can create a workset consisting (for the sake of simplicity) of a single volume from the HathiTrust Digital Library's public domain collection, a published-in-1920 

2. Create a custom workset

This section covers all the operations you can make to the virtual machine. You are required to log in to the HTRC portal before you can perform these operations.