Overview
This project deploys an improved infrastructure for robust corpus building and modeling tools within the HTRC Data Capsule framework to answer research questions requiring large-scale computational experiments on the HTDL. Our research questions depend on the capacity to randomly sample from full text data to train semantic models from large worksets extracted from the HTDL. This project prototypes a system for testing and visualizing topic models using worksets selected according to the Library of Congress Subject Headings (LCSH) hierarchy.
https://www.overleaf.com/3011839bjzpxt#/8283960/ (this report link to be deleted)
Project report can be found at http://arxiv.org/abs/1512.05004
Personnel
Colin Allen, Jaimie Murdock (Indiana University)
Jiaan Zeng (HTRC)