Here are answers to frequently asked questions. If you cannot find answers here, please contact HTRC support by writing to firstname.lastname@example.org
What is HTRC Data Capsule and what can it be used for?
HTRC Data Capsule provides a secure computing environment to access content in the HathiTrust Digital Library. Users are provisioned virtual machines called capsules from which they can interact with HathiTrust volumes. Users can only perform computational analysis within the secure Data Capsule environment and then export the results of their analysis. Users cannot export volume content outside the HTRC Data Capsule. This system is a way of allowing for computational access to restricted texts without violating copyright law.
Do I have computational access to the HathiTrust Digital Library's copyrighted content in Data Capsule?
Right now, no. The current implementation of the HTRC Data Capsule is a prototype for providing non-consumptive analysis. Users have computational access to only public domain texts for the time being, though computational access to in-copyright text is forthcoming. To stay up-to-date about the copyrighted content progress, please subscribe to our user group list htrc-usergroup-l @ list.indiana.edu or check the news announcement from time-to-time.
Who can use the HTRC Data Capsule?
Scholars from non-profit institutions of higher education are eligible to use the HTRC Data Capsule. First, make an account for the HTRC Portal, and once logged in, you can set up a capsule for yourself.
I am from the private sector, can I sign up an account to use the Data Capsule?
Unfortunately not. Our use policy limits access to only users from most academic institutions.
Is there a tutorial for Data Capsule?
Yes, we have written a step-by-step HTRC Data Capsule tutorial you might find useful.
Where can I use Data Capsule?
You'll use the HTRC Portal to set up your HTRC Data Capsule; start, stop, and delete your machine; and switch between maintenance and secure modes. Once you have created and started your Data Capsule, you'll access your capsule using a VNC client, such as Google VNC viewer, and operate your capsule from the desktop on your computer.
What is the threat model and what is in place to protect the restricted data?
The threat model can be found in this pubic version of the final report of the project: http://hdl.handle.net/2022/19277
What is the usage model and how do I interact with the HT data through the HTRC Data Capsules?
This can be found in the pubic version of the final report of the project as well: http://hdl.handle.net/2022/19277
Setting up your HTRC Data Capsule
The HTRC Portal showed an error message when I tried to create a Data Capsule. What went wrong?
Most likely the number of capsules on our server has reached its limit. Please contact HTRC support to solve the issue: email@example.com
How many total Data Capsules can the HTRC support at one time? How many can I check out?
We offer 15 capsules for scholars to check out. In other words, 15 Data Capsules can exist in our system at any one time. While there is no limit on how many capsules a single user can check out, we strongly suggest each user check out only one capsule at a time out of consideration for the wider user community.
I want to create a Virtual Machine (capsule) with several Virtual CPUs (VCPUs). Is there an upper limit to the number of Virtual CPUs that I can create?
Each user is allowed to use up to 10 VCPUs. If you find that you tried to create a capsule with less than ten VCPUs but the attempt failed, then it is possible that you may have already used up your quota of 10 in an existing capsule.
Using your HTRC Data Capsule
Can I ssh ("secure-shell") into an HTRC Data Capsule?
You can ssh into your capsule when it is in maintenance mode only. (See below to learn more about Maintenance and Secure Modes.)
What are Maintenance Mode and Secure Mode in Data Capsule?
Each HTRC Data Capsule has two modes that you can switch between: maintenance mode and secure mode. In maintenance mode, you have access to the web but not to HTRC data services. In maintenance mode you can install software needed to complete your analysis and download other necessary resources. In secure mode all web access is blocked and users can only access the HTRC data services via an API. The purpose of these two modes is to ensure the security of the data.
|Secure mode||Maintenance mode|
|HTRC Data API||Internet|
|HTRC Solr API||HTRC Solr API|
In a typical workflow:
- Install software and download necessary resources for text analysis in the maintenance mode
- Switch to secure mode and fetch via the HTRC Data API (an HTRC data service)
- Analyze the volume data
- After the analysis, switch back to maintenance mode and release results
- Once HTRC staff has reviewed the results, access the results files from a link provided via email
How much time do I have to download results I released from my Data Capsule that have been cleared by the HTRC?
The link for downloading results, which is sent to the user by email, will be good for 12 hours after the email is sent. After that, the results can no longer be downloaded.
Can I import the workset that I created in the HTRC Portal into the HTRC Data Capsule?
Currently, the best way to do this is to download the workset from the Portal, which will export a list of the volume IDs for that workset, and then use the HTRC Data API in the Data Capsule to access the content in those volumes. It is not presently possible to export a workset from the Portal directly into the HTRC Data Capsule, but we expect to integrate this functionality into future versions.
I have some Python scripts that I want to use in my analysis within the HTRC Data Capsule. How should I start?
- First store your Python scripts somewhere on Internet.
- Start your capsule from within the Portal, and make sure your machine is in maintenance mode.
- Log into your capsule from a VNC client.
- Download the Python scripts from the Internet onto your capsule.
- Switch to secure mode.
- If you know the volume IDs that you are interested, you can go ahead to fetch content of these volumes by using this sample Python script in Fetching Volume OCR Content in HTRC Data Capsule (Secure Mode).
- Run your Python scripts agains the content.
- If you don't have the volume IDs of your interest, you can search for volumes along with their ID via the HTRC Solr search engine. You can search by subject, topic, author, year, etc., and identify the volumes of interest and record their volume IDs from Solr search results. The HTRC Solr search API is a RESTFUL web service which you can call in a capsule's secure or maintenance mode. Instructions on how to use HTRC Solr API can be found at Solr Proxy API User Guide.
- Alternatively, you can build a work set from the HTRC Portal and Workset Builder and obtain the volume IDs of your workset from the Portal.
- Once you have the volume IDs ready, you can go ahead to fetch the volume content in Data Capsule secure mode and perform analysis using your Python scripts as mentioned above.
Where can I receive announcements and updates about HTRC Data Capsule?
- Check the HTRC Data Capsule documentation pages from time-to-time.
- Subscribe to our user group list htrc-usergroup-l @ list.indiana.edu to receive most recent announcements and updates about the HTRC Data Capsule as well as other services.
Still have questions?
Please contact HTRC support at firstname.lastname@example.org, and one of our team members will reply to your questions.