Detailed tutorial for using the HTRC Data Capsule System
For convenience, your capsule has been pre-loaded with the packages required to follow these examples.
Since it's performed within the capsule's virtual machine environment, it will be helpful to open a browser in the capsule, e.g. Firefox, and go to the url http://wiki.htrc.illinois.edu/pages/viewpage.action?pageId=22085965 or http://bit.ly/1whzT6H Then you can easily copy and paste the hyperlinks and the commands from the Wiki.
HTRC provides a search engine API, Solr API, for scholars to search volumes of their interest. Scholars can search by full-text, or MARC catalog fields. An example query is
chinkapin.pti.indiana.edu:9994/solr/meta/select/?q=title:war which returns all volumes of which the titles contain "war".
Use the IPython interactive interface to fetch volume content, and then run vector space model and topic modeling on volumes' OCR content.