Child pages
  • The Trace of Theory project
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Overview

The Trace of Theory (TracT) project looked at the question “Can we find and track theory, especially literary theory, in texts using computers?” We proposed to do this on the large collections of the HathiTrust using a variety of techniques with the support of the HathiTrust Research Centre. This project brought together researchers who are part of the Text Mining the Novel project (http://novel-tm.ca/) led by Dr. Andrew Piper at McGill University.

https://docs.google.com/document/d/1BwWd_tR6TtA7kp6QYQuAQte88Ri4Vvcx9Bho7NTKQ6o/edit?ts=5665d43e# (this report link to be deleted) 

It takes a two-step approach to trying to track theory through its textual traces. 

 

1. Subsetting: We propose to experiment with two methods for identifying “theoretical” subsets of texts from large collections like the Google-digitized dataset (GDD) of the HathiTrust. The goal would be to identify subsets of the full GDD that are theoretical in different ways.

 

2. Mining: We would then experiment with large-scale text-mining and clustering methods on these subsets. In particular we propose to try topic modelling and other forms of clustering.

Personnel

Geoffrey Rockwell (Univ of Alberta), Laura Mandell (Texas A&M Univ), Stefan Sinclair (McGill Univ), Matthew Wilkens (Notre Dame), Susan Brown (Univ of Guelph)

 Boris Capitanu (HTRC), Kahyun Choi (HTRC)

Can we find and track theory, especially literary theory, in texts using computers? This project uses subsetting of the HT corpus and text mining to track theory through its textual traces, and develop tools and computational methods for tracking the concept of "theory.”

 

Workflow

  1. Using keyword lists to identify philosophical and literary critical texts

  2. Machine learning to identify subsets
    1. supervised.
    2. unsupervised. Our approach mixed token unigram features with metadata and formal features in ways that may be portable to other text clustering and classification tasks.
  3. Adapting the Galaxy Viewer.

 

 

HathiTrust Reader with the Robert Browning Text Seen in Galaxy Viewer


Visualizations

 


Findings

 

Resources

https://github.com/htrc/ACS-TT/blob/master/tools/notebooks/ClassifyingPhilosophicalText.ipynb (Classifying philosophical texts)

http://nbviewer.ipython.org/github/htrc/ACS-TT/blob/master/tools/notebooks/Unsupervised%20Clustering%20Philosophy.ipynb (Usupervised classification of philosophical genres)

 

 

  • No labels