Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: pd-features > features


Researchers have several options for creating their workset, including quering the HTRC Solr Proxy APIResearchers who do not yet have a workset and who only want to work with the public domain texts can create a workset in the HTRC Workset Builder

Download Format


The HTRC Extracted Features files are formatted in JSON. For more information about the fields, see the documentation for each release


Code Block
rsync -azv{{URL}} .

Using the HTRC Portal Algorithm



Code Block


If your workset contained N volumes with HathiTrust volume IDs V1, V2, V3,... VN respectively, then executing the shell script as shown above will cause the following feature data files for the corresponding volumes to be transferred to your computer’s hard disk via rsync: V1.json.bz2, V2.json.bz2, V3.json.bz2, ..., VN.json.bz2. See Filepaths above to learn more about the pairtree structure the Extracted Features files follow.