Share your work
Do you have a project or tool using the HTRC Extracted Features Dataset? Let us know at firstname.lastname@example.org
Word Similarity Tool, David Mimno
A web-based tool for viewing similar words to a query, for each year from 1800 to 1923.
An interactive, faceted, visualization of terms across the HathiTrust collection, built on the EF dataset.
Within-Book Topic Modeling, Peter Organisciak
An approach for visualizing thematic trends within a book.
A Python library that scaffolds Pandas use of EF data. With example scripts.
Send us your Lessons or Tutorials related to the EF Dataset.
Python code for some simple examples of "literary sleuthing":
- Estimating the proportion of poetry-to-prose in a volume, based on the proportion of capitalized letters (Coleridge wrote a lot more prose than Keats did!)
- Making use of the incidence of a word’s occurrence to draw inferences (Little Dorrit by Charles Dickens mentions "prison" a lot more than his Bleak House does!)
- Identifying that volume in a workset in which a specified word occurs the most times (Which of the English romantic poets was the greatest "dream"-er among them all?)
Underwood, Ted. June 3, 2014. "A window on the twentieth century may be about to open." The Stone and the Shell. Blog. http://tedunderwood.com/2014/06/03/a-window-on-the-twentieth-century-may-be-about-to-open/
Mimno, David. 2014. "Word counting, squared." David Mimno. Blog. http://www.mimno.org/articles/wordsim/