Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


The demo code in HTRC_vsm_corpus.ipynb takes one HTRC volume, and 

  • clean cleans up the content by handling page headers, line breaks, and hyphens
  • Build Builds a Corpus object. It excludes words of which frequency < 3
  • Save Saves the Corpus corpus object for future reviisitrevisit

Then let's open another IPython notebook, HTRC_vsm_model.ipynb (list of IPython notebooks can be found at in the VM)