Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


 

This tutorial was first developed for an Indiana University Scholar Commons event (see announcement), hosted on 9/15/2014 at IU Wells Library. Now it is used as a general tutorial for a hands-on session for HTRC Data Capsule tool. 

Short link for the page http://bit.ly/1whzT6H

Table of Contents

Preparation

First of all, register an account on HTRC portal on the production stack, from where you will access the HTRC Data Capsule. 

click the link

Install a VNC Client on your computer to enable the communication between your computer and the Virtual Machine (VM) to be created. You can choose any VNC client you prefer.

We use VNC View for Google Chrome in this tutorial so also recommend people install the same. Install and launch the app.

click the link

Getting Familiar with the VM

Log in to the HTRC portal where you just created an account and sign in. Create a VM (virtual machine) by clicking on the "HTRC Data Capsule" -> "Create Virtual Machine" on the top of the page. You will be assigned a VM after submitting the VM Creation page.

click the link
Info

Once you have your capsule running, you may find it useful to open this guide in an internet browser in your capsule so you can copy and paste commands. The short link for this page is: https://wiki.htrc.illinois.edu/x/TQFRAQ

Register for an HTRC account if you do not already have one. Read the guide for extended directions, or follow the steps in the link below.

Expand
titleSign up for and sign in to the HTRC
Include Page
Sign Up and Sign In
Sign Up and Sign In


From the Analytics homepage, create a capsule by clicking on Capsule  on the top menu. You will be asked to provide information about the capsule you would like to create. This step also explains how to create or convert an existing Capsule to one with access to the full HathiTrust corpus, for HathiTrust members only. 

Expand
titleCreate a capsule

Include Page
Create or convert a Capsule
Create or convert a Capsule


Start the capsule you created by clicking the Start Capsule button on the Capsules page.

Expand
titleStart the capsule

Include Page
Start the VM
Start the VM

you were assigned by clicking on the "Start VM" button on the Virtual Machines list page (make sure you have logged in the portal in order to see the page). 

click the link

After starting the VM, you can connect to and operate on the VM via the VNC Client you just installed. Use the "Host Name" and "VNC port" fields of the VM as input to the VNC Client: put them the "Address" field of the VNC Viewer, separated by a semicolon ;

click the link

The VM is designed to have 2 modes: maintenance node and secure modes. Under the "Virtual Machines" page, click on "Switch to Secure Mode" or "Switch to Maintenance Mode" buttons to switch between modes.

Under maintenance mode, user is allowed to access network freely except for HTRC corpus repository and install whatever software she wants. In secure mode, network access is restricted. User is only allowed to access a few network addresses e.g., HTRC corpus repository and search service.

click the link

Run text analysis experiments in the VM. Details of conducting experiments are demonstrated in the 4 use cases below. If users want to export results out of the VM, they can release the result in the VM secure mode.

click the link

Use Cases

We walk participants through 4 use cases on using HTRC corpus for text analytics within the HTRC Data Capsule VM. For demo participants' convenience, the VM you just requested has been pre-loaded with required R packages and the IPython tool, along with a volume ID list of the English Short Title collection. All these use cases are to be operated within the VM.

Since it's performed in VM, it will be helpful to open a browser in the VM, e.g. FireFox, and go to the url http://wiki.htrc.illinois.edu/pages/viewpage.action?pageId=22085965 or http://bit.ly/1whzT6H Then you can easily copy and paste the hyperlinks and the commands from the Wiki. 

HTRC provides a search engine API, Solr API, for scholars to search volumes of their interest. Scholars can search by full-text, or MARC catalog fields. An example query is http://chinkapin.pti.indiana.edu:9994/solr/meta/select/?q=title:war which returns all volumes of which the titles contain "war".

click the link

Given a list of volume IDs supplied by users, the HTRC Feature API returns a Term-Document-Matrix (TDM) for the volumes. The matrix contains term frequency count information of each volume, which can be used for further statistical analysis. In this example, we use the English Short Title Catalog's volume ID list, to request its Term-Document-Matrix from the API.

click the link

Using the returned Term-Document-Matrix, we run some R analysis and visually show some insights of the collection (English Short Title Catalog).

click the link

Use the IPython interactive interface to fetch volume content, and then run vector space model and topic modeling on volumes' OCR content.

click the link

Finishing Up

Upon completion of the hands-on, please perform these steps to back up your results, exit the VM, and shutdown the VM. The next time you sign in to the portal, you can restart the VM and continue working within the same environment.

Resources

See the Data Capsule User's Guide for more details about interacting with HTRC Data Capsule VM.


Interact with the capsule either via Remote Desktop viewer or Terminal viewer.

Expand
titleUse your capsule

Include Page
Interact with the capsule
Interact with the capsule

Alternatively, you can SSH into your capsule when it is in maintenance mode only. 

Expand
titleSSH access in maintenance mode

First, you will need a public key. Click "Advanced Features" in the blue box to establish your public key at the bottom of your Capsules page.


Image Added

You will be prompted for a key. If you do not yet have a public key set up, then entering one will establish your key. If you already have a key, resubmitting a response in this box with change your key.

Image Added

You'll find the command to SSH into your capsule in the blue "Advanced Features" box on each capsule's status page.



Switch between maintenance and secure mode.

Expand
titleSwitch capsule modes

Include Page
Switch capsule modes
Switch capsule modes


Share your Research Data Capsule with up to 5 other researchers.

Expand
titleShare capsule

From your capsule listing page, click on the Data Capsule ID for the capsule you would like to share.

Image Added


Then, click the button that says Manage Collaborators.

Image Added


You will be taken to a new page, where you can input the email address for the user you would like to add.

Image Added


The email address must be the one associated with their HTRC Analytics account or you will get an error.

Image Added


When you successfully add a collaborator, that user's information will appear in the table of collaborators. By default, they will have the role of Contributor. Contributors can access the capsule and interact with it in its current state. You will have the role of Owner-Controller.

Image Added


Before the new Contributor can access the capsule, they will need to agree to the Data Capsules Terms of Use. You will also be unable to delegate control of the capsule to them until they have agreed.

Once they have agreed to the Terms of Use, you can choose to make them a Controller of the capsule by clicking the "delegate control" button.


Image Added


Once complete, you'll find that their role has changed to Controller. Only the Controller can start, stop, and switch the modes of the capsule. (The Owner-Controller likewise can do these tasks.)

Image Added


Your role has changed to Owner. The owner can delete the capsule and revoke control from the Controller. Click on the "revoke control" button to resume Owner-Controller status.


Image Added


Now the collaborator again has the role of Contributor and you are Owner-Controller.

Image Added


If you no longer want to share your capsule with a user, click the red 'X' button.


Image Added


When they are removed, you'll see the collaborators table has returned to displaying only you as associated with this capsule.

Image Added



Bring text data into your capsule.

Expand
titleGet data

Include Page
Fetching Volume OCR Content in HTRC Data Capsule (Secure Mode)
Fetching Volume OCR Content in HTRC Data Capsule (Secure Mode)


Perform your analysis. You can follow the Use Case guides for examples of how to perform text analysis in the capsule. 

If you will need more than one session to complete your research, save your interim data to the Secure Volume.

Expand
titleSave data to Secure Volume

Save data to the Secure Volume

Make sure your capsule is in secure mode (see directions above if needed).

Open a terminal window in the capsule and navigate to the secure volume by typing:

Code Block
cd /media/secure_volume



Between sessions, stop the capsule via the HTRC using the web browser on your personal desktop. The next time you log in, you can restart the same capsule and continue your work.

When you are finished with your research, request to export your non-consumptive results. 

Expand
titleExport non-consumptive results

Include Page
Finishing Up
Finishing Up

When you no longer need it, delete your capsule via the HTRC. 


Questions?