*HTRC Data Capsule is funded in part by a grant from the Alfred P. Sloan Foundation. Here we introduce HTRC Capsule v1.0 and its use for non-consumptive Analysis of HathiTrust repository.
This document is a starting point for users working with our non consumptive virtual machine (VM). A non consumptive VM has two modes, i.e., maintenance mode and secure mode. In maintenance mode, user is allowed to access network freely except for HTRC corpus repository and install whatever software she wants. In secure mode, network access is restricted. User is only allowed to access a few network addresses e.g., HTRC corpus repository. In addition, any changes user makes to the OS in secure mode will not be persisted. To save data, user needs to write to a specified storage called secure volume. The secure volume is invisible in maintenance mode.
We provide a user friendly web interface that users can interact with to manipulate VM running on our back end infrastructure. User can have following VM operations:
Create a VM. A virtual machine is created but its power is off.
Start a VM. The virtual machine starts up to maintenance mode.
Stop a VM. The virtual machine shutdowns.
Delete a VM. The virtual machine is deleted. Everything relative to this virtual machine is gone.
Switch a VM from maintenance mode to secure mode or vice versa.
Query VM status.
Once a VM is started, users can log into VM through a VNC client. To run analysis against HTRC OCR repository, users need to switch the VM to secure mode. Below is a typical workflow a new user may follow.
Create a new VM;
Start the new VM;
Log into the VM;
Configure the software environment as needed. Upload the analysis program to the VM;
Switch VM to secure mode through web interface;
Run the analysis program against HTRC corpus repository;
Switch VM back to maintenance mode to regain normal network access.
This section covers all the operations you can make to the virtual machine. You are required to login to the web page before you can perform these operations.
Navigate to “Create Virtual Machine” tab and fill in the form. You need to choose an image from the drop down list, provide username and password for the VNC session, and choose how many CPU and memory you want your virtual machine has. Finally you hit the create vm button. The VM creation procedure usually takes about 1 minute to finish.
Navigate to “Virtual Machines” page, you can see all the VMs and available operations associated with the VM.
You can click on the vm id link to see more details about the VM. The “VM Initial Logging User ID” and “VM Initial Logging Password” are the username and password you use to log into the VM. These are different from the ones you use to open your VNC session to the VM. The “Public IP” and “VNC port” are information you need to open a VNC session. You can also use “Public IP” and “SSH port” to log into VM through ssh but this is only allowed in maintenance mode.
You can start a virtual machine in the “Virtual Machine” page. This operation usually takes 2 ~ 3 minutes. Once the VM starts successfully, you can see more available operations e.g., switch, stop, and delete.
You can use your favorite VNC client to connect to the VM by providing the VNC password and VM login username as well as login password.
By default, VM starts in maintenance mode where you can have network access. To switch to secure mode, you can hit the “Switch to Security” button. Once you perform the mode switch, within the VNC session, your screen is frozen in a short time. After that, you can resume your work. To switch from secure mode to maintenance mode, make sure you eject/unmount the secure volume before switching out of secure mode to ensure that any changes made to the secure volume are made permanent.
You can stop a VM by pushing the “Stop VM” button. After that, the VM shutdowns and everything inside the VM remains.
Although we do not provide a reset button for you to restart the VM directly, you can always stop the VM and then start it again. This has the same effect of pushing a reset button on a machine.
You can delete a VM by pushing the “Delete VM” button. After that, the VM is wiped out and everything inside the VM is gone.
1) The source code
The code base has 3 parts, a web GUI, web service and backend scripts
You can download the code for web GUI from http://sourceforge.net/p/htrc/code/HEAD/tree/HTRC-UI-Portal2/.
You can download the code repository for web service and backend scripts from https://github.com/htrc/HTRC-Data-Capsules
2) The web interface
You can find the url for the web front end from here http://htrc5.pti.indiana.edu:9443/login.