Table of Contents maxLevel 2 minLevel 2
Projects funded by the Andrew W. Mellon Foundation through the Scholar-Curated Worksets for Analysis, Reuse & Dissemination (SCWAReD) grant project.
Mining the Native American Authored Works in HathiTrust for Insights
Kun Lu, Raina Heaton, and Raymond Orr (University of Oklahoma)
This project seeks to compile a collection of Native American authored works in HathiTrust and apply various text mining methods to the collection to reveal the coverage, subjects, perspectives, and writing styles of Native authors. A list of Native authors and their works will be compiled from an existing database created by a member of the project team and from other online resources. This list will be aligned with the HathiTrust digital library to create a workset of Native American authored works in HathiTrust for further analysis. Then, a variety of text mining methods will be used to analyze the subjects, topics, language use, and writing styles of Native American authors. Comparative analysis will be carried out to understand the characteristics of this textual community. The project is expected to develop a database of Native American authors and the bibliographic information of their works, create a reusable workset of Native American authored works in HathiTrust, identify potential gaps in the HathiTrust corpus on this textual community, and provide insights into the characteristics of the community by text mining their works.
The Black Fantastic: Curated Vocabularies, Artifact Analysis and Identification
Clarissa West-White (Bethune Cookman University) and Seretha Williams (Augusta University)
This project focuses on identifying Black Fantastic texts in the HathiTrust Digital Library. The project proposes that characteristics of the Black Fantastic—the cultural production of African Diasporic artists and creators who engage with the intersections of race and technology in their work—exist in historical and current cultural artifacts, including those created by and about future-forward personalities, such as Dr. Mary McLeod Bethune. It builds on previous and ongoing work to create a bibliography of the Black Fantastic that is featured in Third Stone Journal. Works in HathiTrust will be analyzed along with Black Fantastic artifacts from other collections, such as the Dr. Mary McLeod Bethune collection in the Bethune-Cookman University archives. By working across collections, the project will test methods for locating Black Fantastic texts and lives.
Creating Period-Specific Worksets for Latin American Fiction
José Eduardo González (University of Nebraska, Lincoln)
This project seeks to create large datasets to research the history of Latin American fiction and question traditional periodization of this literature by attempting to detect the boundaries between literary periods and subgenre distinctions in Latin American fiction. It will look critically at the techniques for detecting genre distinctions that have developed over the last few years and evaluate how they apply to the particular development of Latin American literary system. While many of the subgenres in the English-speaking literary market such as detective fiction, the Gothic novel, and speculative fiction have followers in Latin America, the genres that have traditionally been considered important for the changes in the literary history of the region are less formulaic and more closely linked to national and regional historical and/or social developments. Instead of attempting to identify canonical documents that typify a genre, this project will examine how documents diverge from a particular canon in order to explore the social and cultural reasons an author might accept or deviate from a dominant style.
The National Negro Health Digital Project: Recovering and Restoring a Black Public Health Corpus
Kim Gallon (Purdue University)
This project draws on HathiTrust’s collection of public health documents on Black health to explore how early twentieth Black public health officials communicated and addressed health disparities that impacted African American communities. The major goal of the project is to create a series of worksets and visualizations that scholars and students of African American health and medicine along with public health experts and physicians can use to deepen historical narratives about Black health that might offer insight into the development of contemporary health communications targeted toward African American communities. The project also establishes some of the research for Technologies of Recovery: Black DH Theory and Praxis, a book in- progress. Finally, the work will fill a gap in the history of African American public health.
Surveying Applicability of Energy Recovery Technology for Waste Treatment
This project leverages HathiTrust’s U.S. Federal Documents Collection to investigate how materials produced by the U.S. federal government document shifts in terminologies of ethnoracial difference. The project will focus on the documents and materials published by the Department of Education (formerly United States Department of Health, Education, and Welfare) and related congressional documents from hearings in specialized subcommittees from 1958 until the present. It will explore how the rhetorics of ethnoracial difference overlapped with the growing allocation of federal resources to postsecondary institutions, particularly Minority Serving Institutions, in the latter half of the 20th century. The start of the National Defense Education Act in 1958 was a watershed moment that signaled the greater engagement of the federal government in higher education.The subsequent passing of the Higher Education Act in 1965, alongside amendments through the 1990s and 2000s, allocated specific federal appropriations to support colleges and universities, including Historically Black Colleges & Universities, Tribal Colleges & Universities, Hispanic Serving Institutions, and Asian American & Native American Pacific Islander Serving Institutions. The project contributes to current work focusing on the history of federal responses to higher education in the United States, and the growing visibility of Minority Serving Institutions as a valuable sector of the postsecondary sector in the United States’ higher education.
Scale Collections of Genre Fiction
Laure Thompson and David Mimno (Cornell University)
This project will develop methods for automatically constructing large-scale collections of genre fiction from HathiTrust. Even, and especially, in digital libraries as large as HathiTrust, it can prove challenging to understand whether the library contains suitable representations of a chosen genre. The researchers plan to focus on collections of speculative fiction novels as a case study, but they intend their solutions to be generalizable. They will identify robust methods for correlating author-title pairs to matching volume sets in HathiTrust. Using these methods in conjunction with lists of novels that were curated by hand, they will build their collections and investigate which works are (over)represented and which are missing. They expect their project will enable scholars to better understand the suitability of studying genre fiction through HathiTrust and highlight underserved author and genre groups. Moreover, the project will result in collections of genre fiction which can be readily reused and reorganized for different lines of humanistic inquiry.
Project report: Building Large-Scale Collections of Genre Fiction: Final Report
Mapping scientific names to the HathiTrust Digital Library
This project aims at identifying all pictorial elements in educational texts from 1800-1850 to explore the interplay between progressive education and print media in the early nineteenth century. The resulting research will characterize the extent to which wood engravings and other reprographic materials were shared among educational publishers. The researcher will extract specific features from page images, such as illustration location, using advances in machine learning. The project intends to make use of the process developed to identify pictorial elements to motivate a new metadata field that describes the location and type of illustrations on the page. An ultimate goal of the project is to move toward “machine-read” texts where the data generated by classifiers and dimensionality reduction techniques are bundled as metadata with the corresponding volumes and made available to future research. (“Machine-read” is a term is borrowed from researcher Ben Schmidt.)
Semantic Phasor Embeddings