Skip to end of metadata
Go to start of metadata


Once you have your capsule running, you may find it useful to open this guide in an internet browser in your capsule so you can copy and paste commands. The short link for this page is: https://wiki.htrc.illinois.edu/x/TQFRAQ

  1. Register for an HTRC account if you do not already have one. Read the guide for extended directions, or follow the steps in the link below.

     Sign up for and sign in to the HTRC

    Sign Up




    • On the Sign Up page, enter requested information together with username you intend to use and password. The password must meet these requirements:                                    
      • Password must be more than 15 characters long.
      • Password must contain characters from three of the following five categories:
        • Uppercase characters of European languages (A through Z, with diacritic marks, Greek and Cyrillic characters)
        • Lowercase characters of European languages (a through z, sharp-s, with diacritic marks, Greek and Cyrillic characters)
        • Base 10 digits (0 through 9)
        • Nonalphanumeric characters
        • Any Unicode character that is categorized as an alphabetic character but is not uppercase or lowercase. This includes Unicode characters from Asian languages.
      • Password must not contain any white spaces.
      • Password must not contain your user ID.

    Trouble shooting: if you can't successfully sign up

    You need to have an academic email to sign up. We maintain a growing list of allowed email domains, e.g. emails ending with .edu or edu.ac or edu.tw. I you find you can't successfully sign up with your academic email then it's probably your email domain is not on the list. In this situation, please submit an account request by clicking on the request an account button within the error message on the page.

    Sign In


    • You will receive an email in the email that you registered. Go to your email box, and follow the activation link in the email to activate your account.
    • Now you have created an HTRC Analytics account, you can sign in. On the top right of the page, sign in with your username and password.

     



  2. From the Analytics homepage, create a capsule by clicking on Capsule  on the top menu. You will be asked to provide information about the capsule you would like to create. This step also explains how to create or convert an existing Capsule to one with access to the full HathiTrust corpus, for HathiTrust members only. 

     Create a capsule

    Navigate to the Capsule Creation Page

    Navigate to "Data Capsules" on the top menu of HTRC Analytics.

    Create a Capsule

    Click Create A Capsule toward the top right.


    You will be prompted to choose to create either a Demo Capsule or a Research Capsule

     

    Create Demo Capsule

    Note:

    • Demo Capsules are not configureable and can access public domain content only.
    • You cannot request to export derived data from a Demo Capsule.

    Hit the Create Capsule button. The capsule creation procedure usually takes about 1 minute to complete. Refresh your screen to see if it has finished.

    You will be prompted to agree to the HTRC Data Capsules Terms of Use. Please review this document as it outlines policy for acceptable in-Capsule behavior. 

    Create Research Capsule

    Note:

    • Demo Capsules are configureable and by default can access public domain content only.
    • You request to export derived data from a Research Capsule.
    • Additional information is required to create your Capsule. 
    • During creation or after it's created, researchers from HathiTrust member institutions can request for their Research Capsule to be converted to one with computational access to the full HathiTrust corpus, including in-copyright content. 

    Fill out the form with the title of your research project, and choose the specs for your Capsule. 

    There are 2 images you can choose from, one that includes sample public domain volumes from HathiTrust for you to test with, and one that does not. 

    Capsule size can range from 2-4 VCPUs and from 4-16 GB of memory. The VCPUs and memory allocation you choose will affect the processing speed of your Capsule. 

    Add the description for your research project. These answers will be used to aid in reviewing requests to export results from your Capsule. The more information you can provide, the more easily we can assess your results for adherence to the HTRC's Non-consumptive Use Research Policy

    Affiliates of HathiTrust member institutions can check the box to request a Capsule with access to the full HathiTrust corpus. 

    Checking that box will prompt you to fill out additional information about your project. Note: Creation requests from users who check this box will be routed for human review. Your request will be reviewed to verify that you are affiliated with a HathiTrust member institutions and that your request demonstrates serious research intentions in compliance with the HTRC's Non-consumptive Use Research Policy and  HTRC Data Capsules Terms of Use


    Include more information about your anticipated results to further assist in the human review of your data export requests.


    If you like, you can choose to allow HTRC to communicate anonymized information about your research project. You must also agree that you will not share your log-in information for HTRC Analytics with anyone. 

    You will be prompted to agree to the HTRC Data Capsules Terms of Use. Please review this document as it outlines policy for acceptable in-Capsule behavior. You will be reminded of these terms regularly while using your Capsule. 

    Check Capsule Status 

    After creating a capsule, you will be taken back to the Capsules page. By default the capsule you just created is not running. 


      

    Convert a Research Capsule

    HathiTrust member-affiliated individuals can request to convert existing Research Capsules into one with access to the full HathiTrust corpus. 

    From your Capsules page, click on the ID of the Capsule you would like to convert. Then, click the button to Request access to Full HathiTrust Corpus.

    You will be taken to the Capsule creation form. If you submit answers when creating your Capsule, they will appear for you to review and, if desired, edit. You will also be asked to fill in additional information about your research use case. Your request will be reviewed to verify that you are affiliated with a HathiTrust member institutions and that your request demonstrates serious research intentions in compliance with the HTRC's Non-consumptive Use Research Policy and  HTRC Data Capsules Terms of Use

     

  3. Start the capsule you created by clicking the Start Capsule button on the Capsules page.

     Start the capsule

    On the Capsules page (found under Capsule on the top menu), click on the Start Capsule button.



     

  4. Interact with the capsule either via Remote Desktop viewer or Terminal viewer. 

     Use your capsule

    To get the details of your capsule so that you can log in to it, click on Data Capsule ID link on the Capsules page.


    You will see the details for your capusle. From this page, you can start, stop, or delete your capsule. You can also click to connect via Terminal (command line interface) or Remote Desktop (to see your capsule's Ubuntu desktop). 


    If you choose connect via Terminal, you will be take to a page showing a command line interface to interact with your capsule. (Note: This option available in Maintenance Mode only.)



    If you choose to connect via Remote Desktop, you will be taken to a page from which you can interact with your capsule's desktop. (Note: This option is available in either Maintenance or Secure Mode.)

    If you wanted to interact with your capsule via SSH from your personal machine, you can follow the directions to set up that access. 


     

     

  5. Alternatively, you can SSH into your capsule when it is in maintenance mode only. 

     SSH access in maintenance mode

    First, you will need a public key. Click "Advanced Features" in the blue box to establish your public key at the bottom of your Capsules page.

    You will be prompted for a key. If you do not yet have a public key set up, then entering one will establish your key. If you already have a key, resubmitting a response in this box with change your key.

    You'll find the command to SSH into your capsule in the blue "Advanced Features" box on each capsule's status page.

  6. Switch between maintenance and secure mode.

     Switch capsule modes

    HTRC Data Capsules have two modes: Maintenance Mode and Secure Mode. In Maintenance Mode, the capsule can access the network (i.e. the internet) so that you can set up your capsule as you like, such as installing software or importing additional, non-HathiTrust data. In Secure Mode, the capsule can access HathiTrust Data. 

    HathiTrust data you import and/or work with in Secure Mode must be stored on the capsule's Secure Volume, a storage location available in Secure Mode only, in order to persist in a capsule when it's modes are switched or when it is turned off and back on. Data transferred or generated in the capsule in Secure Mode that is not saved to the Secure Volume will be deleted when the capsule switches modes or is turned off and on, for security reasons. 

    When viewing your capsule, you will see a blue button to either "Switch to Secure Mode" or "Switch to Maintenance Mode." 



    Click the button to switch. You'll see the capsule's state change.


    Once it has switched, you'll see that you can click the blue button again to switch modes back. 

     

  7. Bring text data into your capsule.

     Get data

    Optional method

    Researchers can use the HTRC Data API to bring text data into their capsule, and can refer to the HTRC Data API guide for more details.

    Preferred method

    HTRC has also developed a Python library for loading volumes into the Data Capsule environment that may be of use: HTRC Workset Toolkit. The Toolkit is standard in all capsules created after March 18, 2018. If you have an earlier-created capsule then you will need to install or update the Toolkit. 

    Make sure you are in secure mode in order to prepare to fetch content into your Data Capsule; it won't work in maintenance mode for security reasons.

    You can use the Workset Toolkit's "htrc download" command to transfer the volumes of interest. For example, running the command below will transfer the OCR text data for the volumes in the generic htrc-id list that comes with the Workset Toolkit to a directory called "output." 

    htrc download htrc-id


    To customize the volumes you transfer to your capsule, create a file containing a volume id list that you're interested in, with one ID per line. Run the above command replacing htrc-id with your file. For example, if you had a file called myvolumes.txt, you would run the following command.

    • To customize your volume ID list, you will need to search in HathiTrust or using other metadata sources, including HathiFiles.


    htrc download myvolumes.txt


    In the above examples, output is the destination folder for the fetched OCR content. If you do not provide an output - by omitting both the -o and directory name - then the files will go to the default directory (/media/secure_volume/workset). You can call the destination folder anything you like by replacing "output" with the name of your choice. 

    Other options

    You can also use a volume ID, collection URL, or catalog record ID to import volumes. Additionally, you have the option to concatenate files and to remove folders. 

    For more examples, see the detailed guide.

    For the technical documentation, see: https://htrc.github.io/HTRC-WorksetToolkit/cli.html



  8. Perform your analysis. You can follow the Use Case guides for examples of how to perform text analysis in the capsule. 

  9. If you will need more than one session to complete your research, save your interim data.

     Save data to Secure Volume

    Save data to the Secure Volume

    Make sure your capsule is in secure mode (see directions above if needed).

    Open a terminal window in the capsule and navigate to the secure volume by typing:

    cd /media/secure_volume

    Suppose the file you'd like to release is at /home/demouser/demo/r/Rplots.pdf

    You can prepare the result data for release by adding it, which is done by typing the command:   

    releaseresults add /home/demouser/demo/r/Rplots.pdf

    Repeat using this command if you have other files to add.

  10. Between sessions, stop the capsule via the HTRC using the web browser on your personal desktop. The next time you log in, you can restart the same capsule and continue your work.

  11. When you are finished with your research, request to export your non-consumptive results. 

     Export non-consumptive results

    (This is the same as Release results)

    If you'd like to export results out of the capsule, you must release them from your virtual machine.

    First, switch the VM to secure mode in the Portal interface. 

    Second, open a terminal in the capsule, navigate to the secure volume by typing:

    cd /media/secure_volume

    Suppose the file you'd like to release is at /home/demouser/demo/r/Rplots.pdf

    You can prepare the result data for release by first adding it the the release list: 

    releaseresults add /home/demouser/demo/r/Rplots.pdf

    Repeat using this command if you have other files to add.

    Finally, to complete the release of your data, type: 

    releaseresults done


    The files will be delivered via email. You will receive them in the email that you registered for the HTRC Analytics account. The email link will be live for 12 hours.

    Shut Down the VM

    Go back to  HTRC Analytics, go to "Data Capsules" and click on the "Stop Capsule" button to shut down the VM you'd like to shut.

    Delete the VM

    If you do not need the capsule any more, you can delete it. On HTRC Analytics, navigate to "Data Capsules", click on the "Delete Capsule" button next to the virtual machine you want to delete.


     

     

  12. When you no longer need it, delete your capsule via the HTRC. 

Questions?

  • No labels