Use the IPython interactive interface to fetch volume content, and then run vector space model and topic modeling on volumes' OCR content. It uses the inpho/vsm python package, a textual semantics package developed by Dr. Colin Allen and his team locally at IU.
VM Mode
This use case can be run in only secure mode in the VM. To export experiment results out of the VM, you need to release the result files in secure mode, and then receive results via email.
Example Use
First, switch the VM mode to secure mode.
In the VM, start a Terminal, and change directory to the htrc-data folder
cd ~/demo/htrc-data
List the files of this folder
ls
Run the topic modeling analysis
./htrc-demo.sh
You will see something like this in the console.
After the topic modeling process is done, you can view the result through the browser. (The browser will be automatically opened for you).
You will find the scripts run into errors if the VM is in maintenance mode.
This demo code:
- loads data from HTRC Data API
- builds an LDA model from the corpus
- save the LDA trained model
- view topics