The HTRC Data Capsule is a secure computing environment developed to facilitate non-consumptive text analysis research. Each capsule is a virtual machine (VM) that provides researchers a desktop they can use to perform their investigation of volumes in the HathiTrust Digital Library. 

Using an HTRC Data Capsule

Administering a Capsule

Use the HTRC site to handle administrative tasks for your capsule:

Maintenance vs. Secure Mode

The capsules are configured with special security settings that allow you to interact with them in two modes:  maintenance mode and secure mode

Interacting with the Capsule

Access your capsule in-browser from HTRC Analytics either by viewing the Remote Desktop (both modes available) or the Terminal command line interface (Maintenance Mode only). Earlier versions of the capsule environment required a VNC viewer and passwords for both the VNC and the capsule's operating system; those requirements are removed in the web-based version that was implemented in August, 2018. You can also SSH into your capsule in Maintenance Mode only if you've followed the directions under "Advanced Features" to set-up a public key. 

To operate your capsule, click on the capsule ID from the capsule list page. Then choose to either view the remote desktop or the terminal. The terminal will work in Maintenance Mode only. 

If you've established a key for SSH access, you can also SSH into your capsule when it's in Maintenance Mode by using the command viewable under "Advanced Features" on an individual capsule's status page.


Earlier versions of the capsule environment required passwords for both the VNC and the capsule's operating system; those requirements are removed in the web-based version that was implemented in August, 2018. If you use the included HTRC Workset Toolkit when in your capsule to import data to your capsule, you will be prompted for your HTRC Analytics username and password.

Generic Research Workflow

  1. Create and start a capsule in the HTRC

  2. View your capsule using the Remote Desktop view or Terminal view.

  3. Configure the software environment of the capsule as needed. Download the scripts or programs you plan to use in your analysis

  4. Switch capsule to secure mode through HTRC

  5. Run your against the secure HTRC corpus repository

  6. Move your results to the secure volume storage on the capsule

  7. Switch capsule back to maintenance mode to regain normal network access

HTRC Data Capsule Configurations

Capsule Technical Specifications

You can set several parameters for their capsule during the creation process

User Quotas

There is an overall disk quota, a memory quota, and a CPU quota for each user in the Data Capsule environment. One user can consume up to 100 GB of disk space, ~20 GB of memory, and 10 CPUs. If you attempt to create a second or third capsule that exceeds your quota in one of the areas above, then you will encounter an error. 

Pre-installed Software, Libraries, and Data

Each capsule comes pre-loaded with the following software, libraries, and data. For more information, consult the ReadMe file on the desktop of your capsule for more details about installed packages.




Anaconda 3 4.2.0https://www.continuum.io/anaconda-overviewSupports both Python 2.X and 3.X. See list below for the Python libraries pre-installed (some via Anaconda)















Python Libraries 

Sample data and programs

Non-consumptive Exports from an HTRC Data Capsule

Data and tools can easily enter a user's capsule, but anything leaving a capsule must undergo review prior to release to the user. The guidelines used during review of the outputs of a capsule are as follows: