Child pages
  • Use Case: Perform Text Analytics Using Topic Explorer
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Use the IPython interactive interface to fetch volume content, and then run vector space model and topic modeling on volumes' OCR content. It uses the inpho/vsm python package, a textual semantics package developed by Dr. Colin Allen and his team locally at IU.

VM Mode

This use case can be run in only secure mode in the VM. To export experiment results out of the VM, you need to release the result files in secure mode, and then receive results via email.

Example Use

First, switch the VM mode to secure mode. 

In the VM, start a Terminal, and change directory to the htrc-data folder

cd ~/demo/htrc-data

List the files of this folder

ls

Run the topic modeling analysis

./htrc-demo.sh

You will see something like this in the console.  

After the topic modeling process is done, you can view the result through the browser. (The browser will be automatically opened for you).

 

You will find the scripts run into errors if the VM is in maintenance mode. 

This demo code:

  • loads data from HTRC Data API
  • builds an LDA model from the corpus
  • save the LDA trained model
  • view topics

  • No labels