Download the dataset
You can download the dataset by clicking the link below for the version of the dataset you want. The options are “volumemeta”, “recordmeta”, or “titlemeta”. You can also download just the unique volume ids with no metadata if that is more helpful. To download just a list of the HTRC volume ids for each version of the data click on the corresponding link below. Your options are “volumeids”, “recordids”, and “titleids”.
In addition, you can use rsync to download the different dataset versions. If you have the Windows operating system you may need to download and install rsync. If you are using MacOS or Linux, rsync should already be installed. To download via rsync you will use the command line. If you are not comfortable with the command line, then use one of the links under “Downloads”. Below is a list of the rsync commands to download each version as well as the lists of volume ids:
NOTE: The final . is necessary and indicates the destination location where the file should be transferred ( . represents "current directory" in UNIX)
Rsync dataset versions
rsync data.htrc.illinois.edu::textual-geographies/volumemeta_geo.tsv.gz .
rsync data.htrc.illinois.edu::textual-geographies/recordmeta_geo.tsv.gz .
rsync data.htrc.illinois.edu::textual-geographies/titlemeta_geo.tsv.gz .
Rsync ID Lists
rsync data.htrc.illinois.edu::textual-geographies/volumemeta_geo.tsv.ids .
rsync data.htrc.illinois.edu::textual-geographies/recordmeta_geo.tsv.ids .
rsync data.htrc.illinois.edu::textual-geographies/titlemeta_geo.tsv.ids .
Explore the dataset
A Google Colab notebook (it is a .ipynb file, so basically a Jupyter Notebook run in Google Colab) has been created to assist with searching and manipulating the dataset. This notebook allows you to search the dataset by volume id or terms of interest and includes step by step instructions and explanations. The “Dataset search notebook” link below will take you to the Google Colab notebook.
An example use case has also been created and made available as a Google Colab notebook. You can access the use case by clicking on the “Use case” link below.