Child pages
  • Towards Cultural-Scale Models of Full-Text project

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


This project deploys an improved infrastructure for robust corpus building and modeling tools within the HTRC Data Capsule framework to answer research questions requiring large-scale computational experiments on the HTDL. Our research questions depend on the capacity to randomly sample from full text data to train semantic models from large worksets extracted from the HTDL. This project prototypes a system for testing and visualizing topic models using worksets selected according to the Library of Congress Subject Headings (LCSH) hierarchy. (this report link to be deleted)

Project report can be found at  Please refer to project report for technical details, administrative, and community impact details. 


Colin Allen, Jaimie Murdock (Indiana University)