Skip to end of metadata
Go to start of metadata

Once you have your capsule running, you may find it useful to open this guide in an internet browser in your capsule so you can copy and paste commands. The short link for this page is:

  1. Register for an HTRC account if you do not already have one. Read the guide for extended directions, or follow the steps in the link below.

     Sign up for and sign in to the HTRC

    Sign Up

    Go to HTRC Analytics  (
    On the top right of the webpage, click on the "Sign Up" button. 

    On the Sign Up page, enter requested information together with username you intend to use and password. The password must meet these requirements:                                    
    • Password must be more than 15 characters long.
    • Password must contain characters from three of the following five categories:
      • Uppercase characters of European languages (A through Z, with diacritic marks, Greek and Cyrillic characters)
      • Lowercase characters of European languages (a through z, sharp-s, with diacritic marks, Greek and Cyrillic characters)
      • Base 10 digits (0 through 9)
      • Nonalphanumeric characters
      • Any Unicode character that is categorized as an alphabetic character but is not uppercase or lowercase. This includes Unicode characters from Asian languages.
    • Password must not contain any white spaces.
    • Password must not contain your user ID.

    Trouble shooting: if you can't successfully sign up

    You need to have an academic email to sign up. We maintain a growing list of allowed email domains, e.g. emails ending with .edu or or I you find you can't successfully sign up with your academic email then it's probably your email domain is not on the list. In this situation, please submit an account request by clicking on the request an account button within the error message on the page.

    Sign In

    You will receive an email in the email that you registered. Go to your email box, and follow the activation link in the email to activate your account.
    Now you have created an HTRC Analytics account, you can sign in. On the top right of the page, sign in with your username and password.


  2. Install a VNC client on your computer to enable the communication between your computer and the capsule you will remote access into. You can choose any VNC client you prefer. We use VNC View for Google Chrome in the screenshots below. You can also use Screen Sharing on a Mac. 

     Install software to your computer

    Install a VNC Client on your computer to enable the communication between your computer and the capsule, which takes the form of a virtual machine (VM), to be created. You can choose any VNC client you prefer.

    We use VNC Viewer for Google Chrome in this tutorial so also recommend people install the same. Install and launch the app.


  3. Make sure you are still logged in to HTRC Analytics where you just created an account. 

  4. From the Analytics homepage, create a capsule by clicking on Capsule  on the top menu. You will be asked to provide information about the capsule you would like to create.

     Create a capsule

    Navigate to the Capsule Creation Page

    Navigate to "Data Capsules" on the top menu of HTRC Analytics.

    Submit Capsule Creation Request

    You will be directed to the capsule creation page where you will fill in the form.

    Choose a capsule image, either the image with preloaded sample volume, or without. 

    Provide a username and password for VNC login, between 2-8 characters. The password here is NOT your HTRC Analytics account password, and you will be prompted to supply it when you enter your Capsule.

    Choose the VCPUs, between 2-4, and Memory in megabytes, between 4-16 GB.

    Finally hit the Create Capsule button. The capsule creation procedure usually takes about 1 minute to complete. Refresh your screen to see if it has finished.

    Check Capsule Status

    After creating a capsule, you will be taken back to the Capsules page. By default the capsule you just created is not running. 



  5. Start the capsule you created by clicking the Start Capsule button on the Capsules page.

     Start the capsule

    On the Capsules page (found under Capsule on the top menu), click on the Start Capsule button.


  6. Interact with the capsule via the VNC client. You will need to enter both your VNC password (the one you set when you created your capsule) and the capsule's operating system password (dcuser) to log-in. 

     Use your capsule

    To get the details of your capsule so that you can log in to it, click on Data Capsule ID link on the Capsules page.

    Then, locate the Hostname and VNC port information on the Capsule Status page where you will be directed. The details on this page are to be used gain remote access to your capsule via the VNC client you installed earlier in the tutorial. Note: If you haven't installed the VNC Viewer, please refer to Software Installation for details.

    Open your VNC client of choice. This example uses VNC Viewer for Chrome.

    Copy-and-paste the VNC url into the Address field and click Connect

    You will be prompted to input Password. Input the password you chose for the capsule VNC when you created it, NOT the one for your Portal account.

    Then you will be asked to input another password, the Ubuntu operating system user's password. Input dcuser for password. Both username and password are dcuser, as can be seen on the Capsule Status page.



  7. Alternatively, you can SSH into your capsule when it is in maintenance mode only. 

     SSH access in maintenance mode

    Key information for accessing your capsule is located on the Capsule Status page.

    On a Linux terminal, enter the following command inserting your capsule's port number found on the Capsule Status page. Then enter the password dcuser when prompted.

    ssh -p <your capsule's port>

    Note that if you want to interact with the Ubuntu desktop of your capsule, or if you want to use your capsule in secure mode, you will need to use a VNC client as described above.

  8. Switch between maintenance and secure mode.

     Switch capsule modes

    Each capsule is designed to have 2 modes: maintenance mode and secure mode.
    • In maintenance mode, the user is allowed to access network freely except for HTRC corpus repository and install whatever software they wants. 
    • In secure mode, network access is restricted. The user is only allowed to access a few network addresses e.g., HTRC corpus repository and search service. 

    Any changes user makes to their capsule in secure mode, such as data they download or create during their analysis, will not persist. To save data, you will need to save your data to a special storage area on your capsule called secure volume. The secure volume is invisible in maintenance mode. Follow the further steps in this tutorial to learn how to preserve your capsule between research sessions.

    To switch modes, on the Capsules page, click on the Switch to Secure Mode or Switch to Maintenance Mode buttons to switch to the other mode, as shown below. You can practice switching modes, but you'll need your capsule in maintenance mode to follow the rest of the tutorial.


  9. Bring text data into your capsule.

     Get data

    Optional method

    Researchers can use the HTRC Data API to bring text data into their capsule, and can refer to the HTRC Data API guide for more details.

    Preferred method

    HTRC has also developed a Python library for loading volumes into the Data Capsule environment that may be of use: HTRC Workset Toolkit. The Toolkit is standard in all capsules created after March 18, 2018. If you have an earlier-created capsule then you will need to install or update the Toolkit. 

    Make sure you are in secure mode in order to prepare to fetch content into your Data Capsule; it won't work in maintenance mode for security reasons.

    You can use the Workset Toolkit's "htrc download" command to transfer the volumes of interest. For example, running the command below will transfer the OCR text data for the volumes in the generic htrc-id list that comes with the Workset Toolkit to a directory called "output." 

    htrc download htrc-id

    To customize the volumes you transfer to your capsule, create a file containing a volume id list that you're interested in, with one ID per line. Run the above command replacing htrc-id with your file. For example, if you had a file called myvolumes.txt, you would run the following command.

    • To customize your volume ID list, you will need to search in HathiTrust or using other metadata sources, including HathiFiles.

    htrc download myvolumes.txt

    In the above examples, output is the destination folder for the fetched OCR content. If you do not provide an output - by omitting both the -o and directory name - then the files will go to the default directory (/media/secure_volume/workset). You can call the destination folder anything you like by replacing "output" with the name of your choice. 

    Other options

    You can also use a volume ID, collection URL, or catalog record ID to import volumes. Additionally, you have the option to concatenate files and to remove folders. 

    For more examples, see the detailed guide.

    For the technical documentation, see:

  10. Import OCR text data and perform your analysis. Follow the Use Case guides for examples of how to perform text analysis in the capsule. 

  11. If you will need more than one session to complete your research, save your interim data.

     Save data to Secure Volume

    Save data to the Secure Volume

    Make sure your capsule is in secure mode (see directions above if needed).

    Open a terminal window in the capsule and navigate to the secure volume by typing:

    cd /media/secure_volume

    Suppose the file you'd like to release is at /home/demouser/demo/r/Rplots.pdf

    You can prepare the result data for release by adding it, which is done by typing the command:   

    releaseresults add /home/demouser/demo/r/Rplots.pdf

    Repeat using this command if you have other files to add.

  12. Between sessions, stop the capsule via the HTRC using the web browser on your personal desktop. The next time you log in, you can restart the same capsule and continue your work.

  13. When you are finished with your research, request to export your non-consumptive data.

     Export non-consumptive results

    Upon completion of the hands-on, please perform these steps to back up your results, exit the VM, and shutdown the VM. The next time you sign in to the portal, you can restart the VM and continue working within the same environment.

    Back up your results

    This is the same as Release results

    If you'd like to export results out of the capsule, you must release them from your virtual machine.

    First, switch the VM to secure mode in the Portal interface. 

    Second, open a terminal in the capsule, navigate to the secure volume by typing:

    cd /media/secure_volume

    Suppose the file you'd like to release is at /home/demouser/demo/r/Rplots.pdf

    You can prepare the result data for release by first adding it the the release list: 

    releaseresults add /home/demouser/demo/r/Rplots.pdf

    Repeat using this command if you have other files to add.

    Finally, to complete the release of your data, type: 

    releaseresults done

    The files will be delivered via email. You will receive them in the email that you registered for the portal account.

    Close the VNC Viewer

    Since we do not need to interact with the capsule as a virtual machine anymore, we can close the VNC Viewer.

    Shut Down the VM

    Go back to  HTRC Analytics, go to "Data Capsules" and click on the "Stop Capsule" button to shut down the VM you'd like to shut.

    Delete the VM

    If you do not need the capsule any more, you can delete it. On HTRC Analytics, navigate to "Data Capsules", click on the "Delete Capsule" button next to the virtual machine you want to delete.



  14. When you no longer need it, delete your capsule via the HTRC. 


  • No labels