Child pages
  • HTRC Data Capsule Specifications and Usage Guide

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


  • Create - a capsule is created, but it is not yet running

  • Start - turn the capsule on in maintenance mode

  • Stop - shutdown a capsule

  • Delete - the capsule is deleted (including its data and settings) 

  • Switch modes - change the capsule from maintenance to secure mode, or vice-versa (see below)

  • See status - view your capsules and their statuses

  • Interact - use your capsule either through a desktop view or a terminal (command line) view

Maintenance vs. Secure Mode


Interacting with the Capsule

Access and log-in to your capsule from your desktop using either a VNC client (Screen Sharing on a Mac, for example) or through SSH. You can log in to a capsule using a VNC client if the capsule is in either maintenance or secure mode. You can only SSH into the capsule when it is in secure mode. 

Obtain the requisite information for the capsule under Capsule → Show Capsules → <capsule Id> 

  1. Access the capsule using a VNC client:
    • Install a VNC client on your computer if you do not already have one. There a several you can choose from, including RealVNC, Chicken, and Screen Sharing (on a Mac). 
    • Open your VNC client and input the  VNC URL 
  2. Access the capsule via SSH:
    • From your Linux terminal, use the following command
Code Block
ssh -p <your capsule port>'s password: dcuser


There are 3 passwords important for using a capsule: 


in-browser from HTRC Analytics either by viewing the Remote Desktop (both modes available) or the Terminal command line interface (Maintenance Mode only). Earlier versions of the capsule environment required a VNC viewer and passwords for both the VNC and the capsule's operating system; those requirements are removed in the web-based version that was implemented in August, 2018. You can also SSH into your capsule in Maintenance Mode only if you've followed the directions under "Advanced Features" to set-up a public key. 

To operate your capsule, click on the capsule ID from the capsule list page. Then choose to either view the remote desktop or the terminal. The terminal will work in Maintenance Mode only. 

If you've established a key for SSH access, you can also SSH into your capsule when it's in Maintenance Mode by using the command viewable under "Advanced Features" on an individual capsule's status page.


Earlier versions of the capsule environment required passwords for both the VNC and the capsule's operating system; those requirements are removed in the web-based version that was implemented in August, 2018. If you use the included HTRC Workset Toolkit when in your capsule to import data to your capsule, you will be prompted for your HTRC Analytics username and password.

Generic Research Workflow

  1. Create and start a capsule in the HTRC

  2. Log into the capsule using a VNC clientView your capsule using the Remote Desktop view or Terminal view.

  3. Configure the software environment of the capsule as needed. Download the scripts or programs you plan to use in your analysis

  4. Switch capsule to secure mode through HTRC

  5. Run your against the secure HTRC corpus repository

  6. Move your results to the secure volume storage on the capsule

  7. Switch capsule back to maintenance mode to regain normal network access


  • Data Capsule Image: there are two images (versions) of the standard capsule desktop, one that comes pre-loaded with sample volumes from the HathiTrust and one that does not, for the researcher to choose betweenVNC Login User Name: set by the researcher
  • VNC Login Password: set by the researcher
  • Virtual Machine CPUs (VCPUs): the number of virtual machine processors from 2-4 VCPUs for the capsule, set by the researcher
  • Memory: displayed in megabytes, between 4096 MB ( 4GB ) and 16000 MB (16GB)16GB, set by the researcher

User Quotas

There is an overall disk quota, a memory quota, and a CPU quota for each user in the Data Capsule environment. One user can consume up to 100 GB of disk space, ~20 GB of memory, and 10 CPUs. If you attempt to create a second or third capsule that exceeds your quota in one of the areas above, then you will encounter an error. 


  • csvkit
  • dask
  • GenSim (currently running with warning)
  • htrc-feature-reader
  • htrc workset toolkit
  • nltk
  • numpy
  • pandas
  • pytables
  • regex
  • scipy
  • theano
  • toolz
  • ujson