Child pages
  • HTRC Data Capsule Specifications and Usage Guide
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

This page details the operations and specifications of the HTRC Data Capsules. See the HTRC Data Capsule Tutorial for more a detailed, step-by-step tutorial for how to use you capsule.


Overview

The HTRC Data Capsule is a secure computing environment developed to facilitate non-consumptive text analysis research. Each capsule is a virtual machine (VM) that provides researchers a desktop they can use to perform their investigation of volumes in the HathiTrust Digital Library. 

The capsules are configured with special security settings that allow you to interact with them in two modes:  maintenance mode and secure mode

  • In maintenance mode, you are allowed to access the network freely and install whatever software you want. 
  • In secure mode, general network access is restricted, but you can access the HTRC corpus repository, which is otherwise blocked. Any changes you make to the capsule in secure mode will not persist. To save data from your analysis, you'll need to save your results in the secure volume storage on your capsule. This storage option is not visible in maintenance mode. 

Use the HTRC Portal interface to set-up and interact with your capsule VM, including: 

  • Create - a capsule is created, but it is not yet running

  • Start - turn the capsule on in maintenance mode

  • Stop - shutdown a capsule

  • Delete - the capsule is deleted (including its data and settings) 

  • Switch modes - change the capsule from maintenance to secure mode, or vice-versa

  • See status - view your capsules and their statuses

Once a capsule is started via the HTRC Portal interface, you can access your capsule desktop using a VNC client. To run analysis using data from the HTRC corpus repository, you'll need to switch the capsule VM to secure mode. Here is a typical workflow a new user may follow:

  1. Create and start a capsule in the HTRC Portal

  2. Log into the capsule using a VNC client

  3. Configure the software environment of the capsule as needed. Download the scripts or programs you plan to use in your analysis

  4. Switch capsule to secure mode through HTRC Portal

  5. Run your against the secure HTRC corpus repository

  6. Move your results to the secure volume storage on the capsule

  7. Switch capsule back to maintenance mode to regain normal network access

HTRC Data Capsule Configurations

Each capsule comes pre-loaded with the following libraries, packages, and data. For more information, consult the ReadMe file on the desktop of your capsule for more details about installed packages.

Python Libraries 

  • csvkit
  • dask
  • GenSim (currently running with warning)
  • htrc-feature-reader
  • nltk
  • numpy
  • pandas
  • pytables
  • regex
  • scipy
  • theano
  • toolz
  • ujson

System-level Packages

  • Ant
  • curl
  • Git
  • GNU Parallel
  • grep
  • htop
  • iotop
  • Java 8
  • jq
  • less
  • Maven
  • pcregrep
  • R
  • rsync
  • Scala 2.11.6
  • SBT
  • Spark
  • vim
  • zsh

Sample data and programs

  • 3 sample HTRC worksets of 1000 volumes each: U.S. Government Documents, German language volumes, 19th Century English Literature. 
  • Topic Explorer: http://inphodata.cogs.indiana.edu/

HTRC Data Capsule operations

Please note: You are required to log in to the HTRC Portal before you can perform these operations.

Create a capsule virtual machine (VM)

  • Navigate to the “Create Virtual Machine” tab and fill in the form.
    • Choose an image from the drop down list
    • Provide username and password for the VNC session
    • Choose how many CPU and memory you want your virtual machine to have
  • Click the "Create VM" button. The VM creation procedure usually takes about 1 minute to finish. You can refresh your screen to see if it has completed.

Show capsule VM status

  • Navigate to the “Virtual Machines” page
  • You can see all the capsule VMs associated with your account and the actions you can perform on them

  • Click on the VM id link to see more details about the capsule VM.
  • The “VM Initial Logging User ID” and “VM Initial Logging Password” are the username and password you use to log into the capsule VM 
    • NOTE: These are different from the ones you use to open your VNC session to the VM
  • The “Public IP” and “VNC port” are information you need to open a VNC session
  • You can also use “Public IP” and “SSH port” to log into VM through ssh but this is only allowed in maintenance mode.

 

Start a capsule VM

  • Start a capsule VM on the “Virtual Machine” page. This operation usually takes 2-3 minutes, and you can refresh your screen to see if it has finished.
  • Once the capsule VM starts successfully, you can see more available operations, including switch, stop, and delete.

Log into a capsule VM

  • Use your preferred VNC client, e.g. Google VNC Client, to connect to the capsule VM by providing the VNC password and VM login username, as well as login password.

Switch modes of a capsule VM

  • By default, the capsule starts in maintenance mode where you can have network access. 
  • To switch to secure mode, click the “Switch to Security” button. 
  • Once you click to switch modes, the screen in your capsule if be frozen for a short time. Once the switch is complete, you'll be able to resume your work.
  • To switch from secure mode to maintenance mode, make sure you eject/unmount the secure volume before switching out of secure mode to ensure that any changes made to the secure volume are made permanent.
  • In the HTRC Portal, go to "HTRC Data Capsule -> Show Virtual Machines" for the page for switching between modes. In the maintenance mode, click on the "Switch To Secure Mode" button in the portal to switch to the secure mode.
  • In the secure mode, click on the "Switch To Maintenance Mode" button in the HTRC Portal to switch to the secure mode.




Stop a Virtual Machine

  • Stop a capsule VM by clicking the “Stop VM” button. After that, the capsule VM shutdowns and everything inside the VM is maintained.


Restart a Virtual Machine

  • Although we do not provide a reset button for you to restart the VM directly, you can always stop the VM and then start it again. This has the same effect of pushing a reset button on a machine.

Delete a Virtual Machine

  • You can delete a capsule VM by clicking the “Delete VM” button. After that, the capsule cleared, including its settings and any data on it.


 

 

  • No labels