You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »


The Trace of Theory (TracT) project looked at the question “Can we find and track theory, especially literary theory, in texts using computers?” We proposed to do this on the large collections of the HathiTrust using a variety of techniques with the support of the HathiTrust Research Centre. This project brought together researchers who are part of the Text Mining the Novel project ( led by Dr. Andrew Piper at McGill University.


It takes a two-step approach to trying to track theory through its textual traces. 


1. Subsetting: We propose to experiment with two methods for identifying “theoretical” subsets of texts from large collections like the Google-digitized dataset (GDD) of the HathiTrust. The goal would be to identify subsets of the full GDD that are theoretical in different ways.


2. Mining: We would then experiment with large-scale text-mining and clustering methods on these subsets. In particular we propose to try topic modelling and other forms of clustering.


Geoffrey Rockwell (Univ of Alberta), Laura Mandell (Texas A&M Univ), Stefan Sinclair (McGill Univ), Matthew Wilkens (Notre Dame), Susan Brown (Univ of Guelph)

 Boris Capitanu (HTRC), Kahyun Choi (HTRC)

Can we find and track theory, especially literary theory, in texts using computers? This project uses subsetting of the HT corpus and text mining to track theory through its textual traces, and develop tools and computational methods for tracking the concept of "theory.”



  1. Using keyword lists to identify philosophical and literary critical texts

  2. Machine learning to identify subsets
    1. supervised.
    2. unsupervised. Our approach mixed token unigram features with metadata and formal features in ways that may be portable to other text clustering and classification tasks.
  3. Adapting the Galaxy Viewer.



HathiTrust Reader with the Robert Browning Text Seen in Galaxy Viewer





Resources (Classifying philosophical texts) (Usupervised classification of philosophical genres)



  • No labels