Child pages
  • HTRC Workset Builder 2.0 (Beta) for Extracted Features 2.0
Skip to end of metadata
Go to start of metadata

The HTRC Workset Builder 2.0 Beta for Extracted Features 2.0 is the next iteration of a new interface to the HTRC Extracted Features Dataset to enable both volume-level metadata search and volume- and page-level unigram (single word) text search of the extracted features in order to build worksets.

As indicated, this interface is currently in beta, and may change.

Quick Guide: How to Build a Workset

To build a workset using HTRC Workset Builder, follow these general steps:

  • Step 0: Decide what to search

  • Step 1: Perform a unigram (single-term) full text or metadata (using field tags) search at the desired level (page or volume)

  • Step 2: Filter through the results and select desired items to add to your workset

  • Step 3: Repeat steps 1 and 2 as necessary until your workset is ready;

  • Step 4: "Export as workset" to HTRC Analytics or download workset metadata.

More information about each of the steps in the workset building process are included in sections below.

Searching

Searches are not case sensitive, and by default, your search will be conducted on pages recognized as English. Click “Search all Languages” if you prefer to search everything. Users can also choose specific languages to limit your search to from those that appear under “Show other languages.” Limit your search to a specific part-of-speech by using the checkboxes under the language, though be aware that not all of the languages have the functionality to search by part-of-speech. Wildcard matching is possible using '?' for a single character and '*' for multiple characters. For example 'canad?' and '*land'.

There are four options for searching: text, metadata, combined and advanced. Text search will search the full text of the volume, at the page level, for a unigram or unigrams (e.g. searching all volumes for the word "rose"). Results returned are volume-level metadata, along with page-level metadata and bag-of-words tokens. Since this a page-based search, you will receive one result for each page that matches your query. To see results grouped by volume (multiple page results under one volume heading and one result), check the box marked "Sort &Group by Volume" under the search bar. Metadata search will search volume-level metadata fields for given unigrams, and return volumes in which the terms appear in a given (or any) field specified in the drop-down menu (e.g. searching all volumes for those with a publicaton place of "bl" the MARC code for Brazil). A Combined search allows both text and metadata search in a given query (e.g. a search to return all volumes published in Brazil in which the term "rose" appeared on a page). Advanced search allows for users familiar with Solr syntax (see below for more information) to construct and execute their own queries.

When searching the page text it is important to realize that every word you enter is treated as a separate term (a unigram) for the purposes of the query that is performed. Effectively phrase searching the page text is not possible.  This is because Workset Builder is a search interface built on the Extracted Features Dataset where the sequential order of the words has been removed, effectively making it bag of words. The closest approximation is to use the AND operator, for example the query lawn AND tennis will return all pages where both words appear somewhere on the page.  In the case of a hyphenated word, this is processed as single term, and so does present as a phrase in terms of indexing, for example the query "lawn tennis" (in quotes) will find pages where that term appears hyphenated. In the case of volume metadata search, the sequential order to the words is kept.  This means phrase searching is possible across metadata.

Search Text

Text search allows users to query the full text, by page, for unigrams (single terms). By default, text searches will search English-language volumes. If you'd like to search all languages, check the "Search pages in all languages" button underneath the search bar. Currently, part-of-speech information is only available for volumes in English, German, Portuguese, Danish, Dutch and Swedish. While other languages are coded in volume metadata and thus can be retrieved, there will not be part-of-speech data available for those volumes.

Text searches will retrieve volume-level metadata, but the main unit of search and retrieval is the page. Since many pages in a single volume may contain a given unigram, users may wish to check the "Sort & Group by Volume" button directly beneath the search bar, which will present results by the volume, with a list of pages on which the term appears, as compared to multiple volume entries with a single associated page in the results view. 

Search Metadata

Metadata search allows users to query the catalogue metadata associated with each volume in the corpus (aka volume metadata). Enter a single term, multiple terms, phrases (in quotes), or any combination thereof.  By default, "All fields" is selected in the drop-down menu next to the search box.  Click on the drop-down menu to select more specific fields to search by, such as Title.

To search multiple metadata fields, enter your search query in a format called Solr syntax. For example, a search for title_t:hamlet AND contributorName_t:shakespeare will return all volumes with “hamlet” in the title field and “shakespeare” in the contributorName field, the latter being the field used by the cataloguer to record a personal or corporate name associated with the volume. The same search can be used with an “OR” operand to return volumes that satisfy either condition. Note that there is no space between the colon and the search term. For more information, see this Solr query syntax guide.

For information on the volume metadata fields, including possible values for fields with controlled vocabularies see the below Metadata field values section.

Searching dates

Metadata date fields can be searched for using (primarily) 4-digit years, e.g. "1948".  A variety of dates can be associated with an volume, such as its publication date (pubDate_i), and the date the digital form of the record was created (dateCreated_i).  Being numeric, these fields are matched to "_i" fields, where the suffix indicates the indexed value is an integer (as opposed to "_t" for text).

To go beyond searching for a document from a particular year (e.g. pubDate_i:1948) to search over a date range, you need to use more complex Solr syntax (more information in the next section) to specify the numeric start and stop values.  After the field to search by is specified, this takes the form [ <val> TO <val>]. For example, to search for volumes published 1880-1883 enter “pubDate_i:[1880 TO 1883]” in the Advanced search tab.  Solr query syntax is case-sensitive, and so, the value range must be expressed as "TO" (and not, for example, "to").

Most recorded dates are numeric, however sometimes the exact date is not known when catalogued, and an "X", "-" or "?" as a digit is substituted to signify this, such as "194X", "194-" or "194?".  For this reason "_t" versions of date-related fields also provided to allow for more thorough date searches to be expressed, albeit requiring more complex syntax.  Note, all dates are mapped to the "_t" version of the field, including the fully numeric ones. 

To continue with the previous example of searching for volumes published 1880-1883, then a more expansive form of the search would be: pubDate_t:188X OR pubDate_t:188\? OR pubDate_t:1880 OR pubDate_t:1881 OR pubDate_t: 1882 OR pubDate_t:1883. One could then look through the results, and make a determination on a case-by-case basis as to whether to leave in a 188?, 188- or 188X date in the result set, or press the 'x' icon in the corner to have it removed.  In the event the search criteria is for publications in a given decade, including the '?', '-' and 'X' ones, then the needed query syntax gets simpler, and can be found with:  pubDate_t:188?. Note, this time the "?" is not preceded with a backslash (\), meaning it is interpreted by Solr as the wildcard character for matching purposes, not the literal question mark symbol.  More about this in the next section!

Volume Metadata

While Solr query syntax terms (met in the earlier sections, above) such as "AND", "OR", and "TO" are case-senstive—meaning they must be entered exactly as detailed—searching for volume-level metadata is case-insensitive: searching for Shakespeare with the term "SHAKESPEARE", "shakespeare" or even "ShakespEare" returns the same results.  Solr query syntax supports wildcard matching using '?' for a single character and '*' for multiple characters. For example 'canad?' and '*land'. While the HTRC Workset Builder interface is designed for unigram (single-term) searching of page-text, it is possible for volume-level metdata to express phrase-based search.  This is done using quotation marks: "term1 term2".

Via the "Metadata" tab you can enter terms directly into the search box, in which case the terms are matched against a common set of volume metadata fields. The main ones are: Title, Name, Genre, Type of Resource, Place of Publication.  You can click on the "Show full query ..." button on the search results page to see what the complete list of fields are.

You can also search for specific volume metadata fields using the field:term form, such as contributorName_t:Austen or as a phrase-base search contributorName_t:"Austen, Jane".

In alphabetical order, searchable fields are as follows.  In moving from Extracted Features 1.5 to 2.0, some of the indexed metadata fields have been changed.  In the table below in cases where this has occurred the old (v1.5) name is displayed with strikethrough to explicitly mark the transition.

Field nameField name in Solr syntaxField description
Access ProfileaccessProfile_t:The code that indicates full-text access level.
Bibliographic FormatbibliographicFormat_t:The code for the format of a volume (e.g. book, serial, etc.).
Date CreateddateCreated_i, dateCreated_t:The time this metadata object was processed.  
Genregenre_t:The genre of the volume.
Handle URL

handleUrl_t:

htid_s:

The persistent identifier for the given volume.
HathiTrust Record NumberhathitrustRecordNumber_t:The unique record number for the volume in the HathiTrust Digital Library.
HathiTrust Bib URL

htBibUrl_t:

mainEntityOfPageCatalogRecord_s

The HathiTrust Bibliographic RESTful call for the volume metadata record.
The provided URL delivers an HTML page. Adding ".json" on to the end of the URL delilvers the metadatda in JSON format.
Imprint

imprint_t:

publisherName_t:

The place of publication, publisher, and publication date of the given volume.
Languagelanguage_t:The primary language of the volume in MARC language code format.

Last Update Date

lastUpdateDate_t:

lastRightsUpdateDate_i:,
lastRightsUpdateDate_t:

The date this page was last updated.
Names

names_t:

contributorName_t:

The personal and corporate names associated with a volume.
OCLCoclc_t:The control number(s) assigned to each bibliographic record by the Online Computer Library Center (OCLC).
Publication DatepubDate_i:, pubDate_t:The publication year.
Publication Place

pubPlace_t:

pubPlaceName_t:

The publication location code in MARC country code format.
Rights Attribute (Access Rights)

rightsAttributes_t:

accessRights_t

The rights attributes for a volume.
Schema Version

schemaVersion_t:

schemaVersion_s:

A version identifier for the format and structure of this metadata object.
Source InstitutionsourceInstitution_t:The institution code of the original institution who contributed the volume.
Titletitle_t:Title of the volume.
Type of Resource

typeOfResource_t:

typeOfResource_s:

The format type of a volume.
Volume IdentifiervolumeIdentifier_t:A unique identifier for the current volume. This is the same identifier used in the HathiTrust and HathiTrust Research Center corpora.

Results

On the results page, you will find the title and unique HathiTrust ID for each volume that contains a result based on your search. You can hover over the title of each volume to trigger a pop-up with brief metadata information. Also listed, if a text search is part of your query, is the page sequence on which your search term appears. Lastly, a link to download the Extracted Features data for each volume in your results is also generated. If you follow the link to the page sequence for your search term, the Extracted Features data–tokens in a variety of views, parts of speech, and token frequencies–for that page is shown, along with links to download the Extracted Features files for the page or volume, along with a thumbnail of the page image with a link back to HathiTrust to view the page directly. Additionally, below the volume title, the full metadata record in human-readable form is available, if you click the "Show metadata" link to expand the section.

Filtering

On your results page, you will see seven different fields that can be used to filter search results. These fields are dervied from the same metadata fields listed above: genre, language, copyright status, author, place of publication, original bibliographic format and classification. To apply facets to filter results, check boxes next to the desired facets under a given heading (e.g. "author"), and then click the "Apply Filter" button that will appear next to the section heading. To filter by values in more than one field/heading, you must first choose to apply filters in one field before doing so in another.

Exporting results

Once you have a desired set of results on the search page, you may work with or save them in a number of ways. For result sets of less than 40 million pages, you may export the entire set of search results as: a list of volume or page IDs, a metadata manifest with one row per volume, or you may choose to download the Extracted Features files for each volume in your result set. When downloading Extracted Features files for result sets, be mindful that for many volumes, this will be a large download, which can take minutes to complete.

To create a workset from more than one search, you can add volumes you'd like to include to your shopping cart by checking boxes next to each volume and pressing the yellow "Add" button or by selecting volumes via check box and dragging and dropping them into the shopping cart icon at the top right on the result page. If you'd like to change the checkboxes for each item, you can use the "Select All On This Page" or "Deselect All" to either check or uncheck all results. Similarly, the "Invert Selection" button can be used to change all checked items to unchecked, or the inverse.

Once your shopping cart is complete with the volumes you're interested in, click on the cart icon to view your workset. From this page, you can directly import your shopping cart as a workset in HTRC Analytics by clicking the "Export as Workset" button at the top right of the shopping cart page. From here, you'll be taken to HTRC Analytics, prompted to sign in, if you aren't already, and asked to provide a name and description of your new workset. Once a workset is imported into Analytics, you can get metadata information, share the workset, and run algorithms over its contents.

Saving worksets

Since worksets created using the Workset Builder are tied to a web browser session, once you exit your browser, your workset will not be saved unless you export it in one of the above ways. For the same reason, worksets cannot be shared via URL unless they are imported into HTRC Analytics.

Metadata field values

The Bibliographic Format field is coded; for example using BK to represent a Book. The possible codes used for this field are:

BK:BooksCF:Computer FilesCR:Continuing ResourcesMP:Maps
MU:MusicMX:Mixed MaterialsSE:SerialsVM:Visual Materials


The Place of Publication field pubPlaceName_t uses MARC country codes, with the following possible values:

aa:Albaniaabc:Albertaaca:Australian Capital Territoryae:Algeria
af:Afghanistanag:Argentinaai:Armenia (Republic)aj:Azerbaijan
aku:Alaskaalu:Alabamaam:Anguillaan:Andorra
ao:Angolaaq:Antigua and Barbudaaru:Arkansasas:American Samoa
at:Australiaau:Austriaaw:Arubaay:Antarctica
azu:Arizonaba:Bahrainbb:Barbadosbcc:British Columbia
bd:Burundibe:Belgiumbf:Bahamasbg:Bangladesh
bh:Belizebi:British Indian Ocean Territorybl:Brazilbm:Bermuda Islands
bn:Bosnia and Herzegovinabo:Boliviabp:Solomon Islandsbr:Burma
bs:Botswanabt:Bhutanbu:Bulgariabv:Bouvet Island
bw:Belarusbx:Bruneica:Caribbean Netherlandscau:California
cb:Cambodiacc:Chinacd:Chadce:Sri Lanka
cf:Congo (Brazzaville)cg:Congo (Democratic Republic)ch:China (Republic : 1949- )ci:Croatia
cj:Cayman Islandsck:Colombiacl:Chilecm:Cameroon
co:Curaçaocou:Coloradocq:Comoroscr:Costa Rica
ctu:Connecticutcu:Cubacv:Cabo Verdecw:Cook Islands
cx:Central African Republiccy:Cyprusdcu:District of Columbiadeu:Delaware
dk:Denmarkdm:Benindq:Dominicadr:Dominican Republic
ea:Eritreaec:Ecuadoreg:Equatorial Guineaem:Timor-Leste
enk:Englander:Estoniaes:El Salvadoret:Ethiopia
fa:Faroe Islandsfg:French Guianafi:Finlandfj:Fiji
fk:Falkland Islandsflu:Floridafm:Micronesia (Federated States)fp:French Polynesia
fr:Francefs:Terres australes et antarctiques françaisesft:Djiboutigau:Georgia
gb:Kiribatigd:Grenadagh:Ghanagi:Gibraltar
gl:Greenlandgm:Gambiago:Gabongp:Guadeloupe
gr:Greecegs:Georgia (Republic)gt:Guatemalagu:Guam
gv:Guineagw:Germanygy:Guyanagz:Gaza Strip
hiu:Hawaiihm:Heard and McDonald Islandsho:Hondurasht:Haiti
hu:Hungaryiau:Iowaic:Icelandidu:Idaho
ie:Irelandii:Indiailu:Illinoisinu:Indiana
io:Indonesiaiq:Iraqir:Iranis:Israel
it:Italyiv:Côte d'Ivoireiy:Iraq-Saudi Arabia Neutral Zoneja:Japan
ji:Johnston Atolljm:Jamaicajo:Jordanke:Kenya
kg:Kyrgyzstankn:Korea (North)ko:Korea (South)ksu:Kansas
ku:Kuwaitkv:Kosovokyu:Kentuckykz:Kazakhstan
lau:Louisianalb:Liberiale:Lebanonlh:Liechtenstein
li:Lithuanialo:Lesothols:Laoslu:Luxembourg
lv:Latvialy:Libyamau:Massachusettsmbc:Manitoba
mc:Monacomdu:Marylandmeu:Mainemf:Mauritius
mg:Madagascarmiu:Michiganmj:Montserratmk:Oman
ml:Malimm:Maltamnu:Minnesotamo:Montenegro
mou:Missourimp:Mongoliamq:Martiniquemr:Morocco
msu:Mississippimtu:Montanamu:Mauritaniamv:Moldova
mw:Malawimx:Mexicomy:Malaysiamz:Mozambique
nbu:Nebraskancu:North Carolinandu:North Dakotane:Netherlands
nfc:Newfoundland and Labradorng:Nigernhu:New Hampshirenik:Northern Ireland
nju:New Jerseynkc:New Brunswicknl:New Caledonianmu:New Mexico
nn:Vanuatuno:Norwaynp:Nepalnq:Nicaragua
nr:Nigeriansc:Nova Scotiantc:Northwest Territoriesnu:Nauru
nuc:Nunavutnvu:Nevadanw:Northern Mariana Islandsnx:Norfolk Island
nyu:New York (State)nz:New Zealandohu:Ohiooku:Oklahoma
onc:Ontariooru:Oregonot:Mayottepau:Pennsylvania
pc:Pitcairn Islandpe:Perupf:Paracel Islandspg:Guinea-Bissau
ph:Philippinespic:Prince Edward Islandpk:Pakistanpl:Poland
pn:Panamapo:Portugalpp:Papua New Guineapr:Puerto Rico
pw:Palaupy:Paraguayqa:Qatarqea:Queensland
quc:Québec (Province)rb:Serbiare:Réunionrh:Zimbabwe
riu:Rhode Islandrm:Romaniaru:Russia (Federation)rw:Rwanda
sa:South Africasc:Saint-Barthélemyscu:South Carolinasd:South Sudan
sdu:South Dakotase:Seychellessf:Sao Tome and Principesg:Senegal
sh:Spanish North Africasi:Singaporesj:Sudansl:Sierra Leone
sm:San Marinosn:Sint Maartensnc:Saskatchewanso:Somalia
sp:Spainsq:Swazilandsr:Surinamss:Western Sahara
st:Saint-Martinstk:Scotlandsu:Saudi Arabiasw:Sweden
sx:Namibiasy:Syriasz:Switzerlandta:Tajikistan
tc:Turks and Caicos Islandstg:Togoth:Thailandti:Tunisia
tk:Turkmenistantl:Tokelautma:Tasmaniatnu:Tennessee
to:Tongatr:Trinidad and Tobagots:United Arab Emiratestu:Turkey
tv:Tuvalutxu:Texastz:Tanzaniaua:Egypt
uc:United States Misc. Caribbean Islandsug:Ugandauik:United Kingdom Misc. Islandsun:Ukraine
up:United States Misc. Pacific Islandsutu:Utahuv:Burkina Fasouy:Uruguay
uz:Uzbekistanvau:Virginiavb:British Virgin Islandsvc:Vatican City
ve:Venezuelavi:Virgin Islands of the United Statesvm:Vietnamvp:Various places
vra:Victoriavtu:Vermontwau:Washington (State)wea:Western Australia
wf:Wallis and Futunawiu:Wisconsinwj:West Bank of the Jordan Riverwk:Wake Island
wlk:Walesws:Samoawvu:West Virginiawyu:Wyoming
xa:Christmas Island (Indian Ocean)xb:Cocos (Keeling) Islandsxc:Maldivesxd:Saint Kitts-Nevis
xe:Marshall Islandsxf:Midway Islandsxga:Coral Sea Islands Territoryxh:Niue
xj:Saint Helenaxk:Saint Luciaxl:Saint Pierre and Miquelonxm:Saint Vincent and the Grenadines
xn:Macedoniaxna:New South Walesxo:Slovakiaxoa:Northern Territory
xp:Spratly Islandxr:Czech Republicxra:South Australiaxs:South Georgia and the South Sandwich Islands
xv:Sloveniaxx:No placexxc:Canadaxxk:United Kingdom
xxu:United Statesye:Yemenykc:Yukon Territoryza:Zambi


Deprecated MARC Place of Publication codes, which are no longer being actively used, but may still appear in data, are:

-ac:Ashmore and Cartier Islands-ai:Anguilla-air:Armenian S.S.R.-ajr:Azerbaijan S.S.R.
-bwr:Byelorussian S.S.R.-cn:Canada-cp:Canton and Enderbury Islands-cs:Czechoslovakia
-cz:Canal Zone-err:Estonia-ge:Germany (East)-gn:Gilbert and Ellice Islands
-gsr:Georgian S.S.R.-hk:Hong Kong-iu:Israel-Syria Demilitarized Zones-iw:Israel-Jordan Demilitarized Zones
-jn:Jan Mayen-kgr:Kirghiz S.S.R.-kzr:Kazakh S.S.R.-lir:Lithuania
-ln:Central and Southern Line Islands-lvr:Latvia-mh:Macao-mvr:Moldavian S.S.R.
-na:Netherlands Antilles-nm:Northern Mariana Islands-pt:Portuguese Timor-rur:Russian S.F.S.R.
-ry:Ryukyu Islands, Southern-sb:Svalbard-sk:Sikkim-sv:Swan Islands
-tar:Tajik S.S.R.-tkr:Turkmen S.S.R.-tt:Trust Territory of the Pacific Islands-ui:United Kingdom Misc. Islands
-uk:United Kingdom-unr:Ukraine-ur:Soviet Union-us:United States
-uzr:Uzbek S.S.R.-vn:Vietnam, North-vs:Vietnam, South-wb:West Berlin
-xi:Saint Kitts-Nevis-Anguilla-xxr:Soviet Union-ys:Yemen (People's Democratic Republic)-yu:Serbia and Montenegro


The set of codes used for the Language field language_t are also derived from MARC language codes, with the following possible values:

aar:Afarabk:Abkhazace:Achineseach:Acoli
ada:Adangmeady:Adygeiafa:Afroasiatic (Other)afh:Afrihili (Artificial language)
afr:Afrikaansain:Ainuaka:Akanakk:Akkadian
alb:Albanianale:Aleutalg:Algonquian (Other)alt:Altai
amh:Amharicang:English, Old (ca. 450-1100)anp:Angikaapa:Apache languages
ara:Arabicarc:Aramaicarg:Aragonesearm:Armenian
arn:Mapuchearp:Arapahoart:Artificial (Other)arw:Arawak
asm:Assameseast:Bableath:Athapascan (Other)aus:Australian languages
ava:Avaricave:Avestanawa:Awadhiaym:Aymara
aze:Azerbaijanibad:Banda languagesbai:Bamileke languagesbak:Bashkir
bal:Baluchibam:Bambaraban:Balinesebaq:Basque
bas:Basabat:Baltic (Other)bej:Bejabel:Belarusian
bem:Bembaben:Bengaliber:Berber (Other)bho:Bhojpuri
bih:Bihari (Other)bik:Bikolbin:Edobis:Bislama
bla:Siksikabnt:Bantu (Other)bos:Bosnianbra:Braj
bre:Bretonbtk:Batakbua:Buriatbug:Bugis
bul:Bulgarianbur:Burmesebyn:Bilincad:Caddo
cai:Central American Indian (Other)car:Caribcat:Catalancau:Caucasian (Other)
ceb:Cebuanocel:Celtic (Other)cha:Chamorrochb:Chibcha
che:Chechenchg:Chagataichi:Chinesechk:Chuukese
chm:Marichn:Chinook jargoncho:Choctawchp:Chipewyan
chr:Cherokeechu:Church Slavicchv:Chuvashchy:Cheyenne
cmc:Chamic languagescop:Copticcor:Cornishcos:Corsican
cpe:Creoles and Pidgins, English-based (Other)cpf:Creoles and Pidgins, French-based (Other)cpp:Creoles and Pidgins, Portuguese-based (Other)cre:Cree
crh:Crimean Tatarcrp:Creoles and Pidgins (Other)csb:Kashubiancus:Cushitic (Other)
cze:Czechdak:Dakotadan:Danishdar:Dargwa
day:Dayakdel:Delawareden:Slaveydgr:Dogrib
din:Dinkadiv:Divehidoi:Dogridra:Dravidian (Other)
dsb:Lower Sorbiandua:Dualadum:Dutch, Middle (ca. 1050-1350)dut:Dutch
dyu:Dyuladzo:Dzongkhaefi:Efikegy:Egyptian
eka:Ekajukelx:Elamiteeng:Englishenm:English, Middle (1100-1500)
epo:Esperantoest:Estonianewe:Eweewo:Ewondo
fan:Fangfao:Faroesefat:Fantifij:Fijian
fil:Filipinofin:Finnishfiu:Finno-Ugrian (Other)fon:Fon
fre:Frenchfrm:French, Middle (ca. 1300-1600)fro:French, Old (ca. 842-1300)frr:North Frisian
frs:East Frisianfry:Frisianful:Fulafur:Friulian
gaa:gay:Gayogba:Gbayagem:Germanic (Other)
geo:Georgianger:Germangez:Ethiopicgil:Gilbertese
gla:Scottish Gaelicgle:Irishglg:Galicianglv:Manx
gmh:German, Middle High (ca. 1050-1500)goh:German, Old High (ca. 750-1050)gon:Gondigor:Gorontalo
got:Gothicgrb:Grebogrc:Greek, Ancient (to 1453)gre:Greek, Modern (1453-)
grn:Guaranigsw:Swiss Germanguj:Gujaratigwi:Gwich'in
hai:Haidahat:Haitian French Creolehau:Hausahaw:Hawaiian
heb:Hebrewher:Hererohil:Hiligaynonhim:Western Pahari languages
hin:Hindihit:Hittitehmn:Hmonghmo:Hiri Motu
hrv:Croatianhsb:Upper Sorbianhun:Hungarianhup:Hupa
iba:Ibanibo:Igboice:Icelandicido:Ido
iii:Sichuan Yiijo:Ijoiku:Inuktitutile:Interlingue
ilo:Ilokoina:Interlingua (International Auxiliary Language Association)inc:Indic (Other)ind:Indonesian
ine:Indo-European (Other)inh:Ingushipk:Inupiaqira:Iranian (Other)
iro:Iroquoian (Other)ita:Italianjav:Javanesejbo:Lojban (Artificial language)
jpn:Japanesejpr:Judeo-Persianjrb:Judeo-Arabickaa:Kara-Kalpak
kab:Kabylekac:Kachinkal:Kalâtdlisutkam:Kamba
kan:Kannadakar:Karen languageskas:Kashmirikau:Kanuri
kaw:Kawikaz:Kazakhkbd:Kabardiankha:Khasi
khi:Khoisan (Other)khm:Khmerkho:Khotanesekik:Kikuyu
kin:Kinyarwandakir:Kyrgyzkmb:Kimbundukok:Konkani
kom:Komikon:Kongokor:Koreankos:Kosraean
kpe:Kpellekrc:Karachay-Balkarkrl:Kareliankro:Kru (Other)
kru:Kurukhkua:Kuanyamakum:Kumykkur:Kurdish
kut:Kootenailad:Ladinolah:Lahndālam:Lamba (Zambia and Congo)
lao:Laolat:Latinlav:Latvianlez:Lezgian
lim:Limburgishlin:Lingalalit:Lithuanianlol:Mongo-Nkundu
loz:Loziltz:Luxembourgishlua:Luba-Lulualub:Luba-Katanga
lug:Gandalui:Luiseñolun:Lundaluo:Luo (Kenya and Tanzania)
lus:Lushaimac:Macedonianmad:Maduresemag:Magahi
mah:Marshallesemai:Maithilimak:Makasarmal:Malayalam
man:Mandingomao:Maorimap:Austronesian (Other)mar:Marathi
mas:Maasaimay:Malaymdf:Mokshamdr:Mandar
men:Mendemga:Irish, Middle (ca. 1100-1550)mic:Micmacmin:Minangkabau
mis:Miscellaneous languagesmkh:Mon-Khmer (Other)mlg:Malagasymlt:Maltese
mnc:Manchumni:Manipurimno:Manobo languagesmoh:Mohawk
mon:Mongolianmos:Moorémul:Multiple languagesmun:Munda (Other)
mus:Creekmwl:Mirandesemwr:Marwarimyn:Mayan languages
myv:Erzyanah:Nahuatlnai:North American Indian (Other)nap:Neapolitan Italian
nau:Naurunav:Navajonbl:Ndebele (South Africa)nde:Ndebele (Zimbabwe)
ndo:Ndongands:Low Germannep:Nepalinew:Newari
nia:Niasnic:Niger-Kordofanian (Other)niu:Niueannno:Norwegian (Nynorsk)
nob:Norwegian (Bokmål)nog:Nogainon:Old Norsenor:Norwegian
nqo:N'Konso:Northern Sothonub:Nubian languagesnwc:Newari, Old
nya:Nyanjanym:Nyamwezinyn:Nyankolenyo:Nyoro
nzi:Nzimaoci:Occitan (post-1500)oji:Ojibwaori:Oriya
orm:Oromoosa:Osageoss:Osseticota:Turkish, Ottoman
oto:Otomian languagespaa:Papuan (Other)pag:Pangasinanpal:Pahlavi
pam:Pampangapan:Panjabipap:Papiamentopau:Palauan
peo:Old Persian (ca. 600-400 B.C.)per:Persianphi:Philippine (Other)phn:Phoenician
pli:Palipol:Polishpon:Pohnpeianpor:Portuguese
pra:Prakrit languagespro:Provençal (to 1500)pus:Pushtoque:Quechua
raj:Rajasthanirap:Rapanuirar:Rarotonganroa:Romance (Other)
roh:Raeto-Romancerom:Romanirum:Romanianrun:Rundi
rup:Aromanianrus:Russiansad:Sandawesag:Sango (Ubangi Creole)
sah:Yakutsai:South American Indian (Other)sal:Salishan languagessam:Samaritan Aramaic
san:Sanskritsas:Sasaksat:Santaliscn:Sicilian Italian
sco:Scotssel:Selkupsem:Semitic (Other)sga:Irish, Old (to 1100)
sgn:Sign languagesshn:Shansid:Sidamosin:Sinhalese
sio:Siouan (Other)sit:Sino-Tibetan (Other)sla:Slavic (Other)slo:Slovak
slv:Sloveniansma:Southern Samisme:Northern Samismi:Sami
smj:Lule Samismn:Inari Samismo:Samoansms:Skolt Sami
sna:Shonasnd:Sindhisnk:Soninkesog:Sogdian
som:Somalison:Songhaisot:Sothospa:Spanish
srd:Sardiniansrn:Sranansrp:Serbiansrr:Serer
ssa:Nilo-Saharan (Other)ssw:Swazisuk:Sukumasun:Sundanese
sus:Sususux:Sumerianswa:Swahiliswe:Swedish
syc:Syriacsyr:Syriac, Moderntah:Tahitiantai:Tai (Other)
tam:Tamiltat:Tatartel:Telugutem:Temne
ter:Terenatet:Tetumtgk:Tajiktgl:Tagalog
tha:Thaitib:Tibetantig:Tigrétir:Tigrinya
tiv:Tivtkl:Tokelauantlh:Klingon (Artificial language)tli:Tlingit
tmh:Tamashektog:Tonga (Nyasa)ton:Tongantpi:Tok Pisin
tsi:Tsimshiantsn:Tswanatso:Tsongatuk:Turkmen
tum:Tumbukatup:Tupi languagestur:Turkishtut:Altaic (Other)
tvl:Tuvaluantwi:Twityv:Tuvinianudm:Udmurt
uga:Ugariticuig:Uighurukr:Ukrainianumb:Umbundu
und:Undeterminedurd:Urduuzb:Uzbekvai:Vai
ven:Vendavie:Vietnamesevol:Volapükvot:Votic
wak:Wakashan languageswal:Wolaytawar:Waraywas:Washoe
wel:Welshwen:Sorbian (Other)wln:Walloonwol:Wolof
xal:Oiratxho:Xhosayao:Yao (Africa)yap:Yapese
yid:Yiddishyor:Yorubaypk:Yupik languageszap:Zapotec
zbl:Blissymbolicszen:Zenagazha:Zhuangznd:Zande languages
zul:Zuluzun:Zunizxx:No linguistic contentzza:Zaz


Deprecated language codes, which may still appear in metadata records, are:

-ajm:Aljamía-cam:Khmer-esk:Eskimo languages-esp:Esperanto
-eth:Ethiopic-far:Faroese-fri:Frisian-gae:Scottish Gaelix
-gag:Galician-gal:Oromo-gua:Guarani-int:Interlingua (International Auxiliary Language Association)
-iri:Irish-kus:Kusaie-lan:Occitan (post 1500)-lap:Sami
-max:Manx-mla:Malagasy-mol:Moldavian-sao:Samoan
-scc:Serbian-scr:Croatian-sho:Shona-snh:Sinhalese
-sso:Sotho-swz:Swazi-tag:Tagalog-taj:Tajik
-tar:Tatar-tru:Truk-tsw:Tswana


The set of codes used for the Copyright field accessRights_t are:

cc-by-3.0:CC BY 3.0cc-by-4.0:CC BY 4.0cc-by-nc-3.0:CC BY-NC 3.0
cc-by-nc-4.0:CC BY-NC 4.0cc-by-nc-nd-3.0:CC BY-NC-ND 3.0cc-by-nc-nd-4.0:CC BY-NC-ND 4.0
cc-by-nc-sa-3.0:CC BY-NC-SA 3.0cc-by-nc-sa-4.0:CC BY-NC-SA 4.0cc-by-nd-3.0:CC BY-ND 3.0
cc-by-nd-4.0:CC BY-ND 4.0cc-by-sa-3.0:CC BY-SA 3.0cc-by-sa-4.0:CC BY-SA 4.0
cc-zero:CC Zeroic:In-copyrightic-world:In-copyright (world viewable)
icus:US copyrightnobody:Blockedop:Out-of-print
orph:Copyright-orphanedorphcand:Orphanpd:Public domain
pd-pvt:Access limitedpdus:Public domain in US onlysupp:Suppressed from view
und:Undeterminedund-world:Undetermined


Example Solr JSON Records

An example record returned from SOLR is a useful way to see what fields are indexed, and from that fashion your query terms you enter into the Workset Builder interface.  At the volume level, here is one of the records returned for the query that searches for: title_t:"USITC publication"

 {
        "lastRightsUpdateDate_t":["20190729"],
        "lastRightsUpdateDate_s":"20190729",
        "schemaVersion_s":"https://schemas.hathitrust.org/EF_Schema_MetadataSubSchema_v_3.0",
        "genre_ss":["http://id.loc.gov/vocabulary/marcgt/doc",
          "http://id.loc.gov/vocabulary/marcgt/gov"],
        "typeOfResource_s":"http://id.loc.gov/ontologies/bibframe/Text",
        "language_ss":["eng"],
        "accessRights_t":["pd"],
        "oclc_t":["4263346"],
        "accessRights_s":"pd",
        "htid_s":"http://hdl.handle.net/2027/mdp.39015084961401",
        "pubPlaceId_ss":["http://id.loc.gov/vocabulary/countries/dcu"],
        "pubDate_t":["197X"],
        "publisherName_t":["U.S. Govt. Print. Off."],
        "htid_t":["http://hdl.handle.net/2027/mdp.39015084961401"],
        "lastRightsUpdateDate_i":20190729,
        "title_s":"USITC publication /",
        "title_t":["USITC publication /"],
        "id":"mdp.39015084961401",
        "mainEntityOfPageCatalogRecord_s":"https://catalog.hathitrust.org/Record/003922761",
        "pubPlaceType_ss":["http://id.loc.gov/ontologies/bibframe/Place"],
        "sourceInstitution_t":["MIU"],
        "contributorName_t":["United States International Trade Commission."],
        "sourceInstitution_s":"MIU",
        "contributorType_ss":["http://id.loc.gov/ontologies/bibframe/Organization"],
        "publisherId_ss":["http://catalogdata.library.illinois.edu/lod/entities/ProvisionActivityAgent/ht/U.S.%20Govt.%20Print.%20Off."],
        "pubPlaceName_ss":["District of Columbia"],
        "pubDate_s":"197X",
        "publisherName_ss":["U.S. Govt. Print. Off."],
        "accessProfile_s":"google",
        "pubPlaceName_t":["District of Columbia"],
        "dateCreated_i":20200209,
        "contributorName_ss":["United States International Trade Commission."],
        "accessProfile_t":["google"],
        "publisherType_ss":["http://id.loc.gov/ontologies/bibframe/Organization"],
        "contributorId_ss":["http://www.viaf.org/viaf/126322615"],
        "language_t":["eng"],
        "oclc_ss":["4263346"],
        "dateCreated_t":["20200209"],
        "dateCreated_s":"20200209",
        "bibliographicFormat_t":["PublicationVolume"],
        "bibliographicFormat_s":"PublicationVolume",
        "_version_":1676455128102600705
}

The equivalent page-level search, to return all page-based Solr indexed records, where the title of the volume the page comes from is "USITC publiction", the query would be: volumetitle_txt:"USITC publication".  Note the addition of the prefix to the field 'volume': this helps separate out volume-only metadata searching from page-level based searching when it is combined with volume metadata.  Note also the change of suffix from "_t" (used when searching only volume-level metadta) to "_txt" when searching at the page level: the latter suffix (_txt_ is very similar to the former (_t), only it does not get stored in the Solr index.  There is no need to store it with the page-level record, as it can be retrieved when needed from the volume metadata recrod.

An example record returned by this query is as follows:

{
        "volumeid_s":"ien.35556029988656",
        "id":"ien.35556029988656.page-000038",
        "volumedateCreated_i":20200209,
        "volumelastRightsUpdateDate_i":20170115,
        "volumebibliographicFormat_htrcstring":"PublicationVolume",
        "volumepubDate_htrcstring":"197X",
        "volumecontributorId_htrcstrings":["http://www.viaf.org/viaf/158040275"],
        "volumepublisherName_htrcstrings":["U.S. Govt."],
        "volumelanguage_htrcstrings":["eng"],
        "volumetypeOfResource_htrcstring":"http://id.loc.gov/ontologies/bibframe/Text",
        "volumepubPlaceType_htrcstrings":["http://id.loc.gov/ontologies/bibframe/Place"],
        "volumecontributorName_htrcstrings":["United States. National Transportation Safety Board."],
        "volumeoclc_htrcstrings":["6371936"],
        "volumegenre_htrcstrings":["http://id.loc.gov/vocabulary/marcgt/doc",
          "http://id.loc.gov/vocabulary/marcgt/gov"],
        "volumecontributorType_htrcstrings":["http://id.loc.gov/ontologies/bibframe/Jurisdiction"],
        "volumedateCreated_htrcstring":"20200209",
        "volumemainEntityOfPageCatalogRecord_htrcstring":"https://catalog.hathitrust.org/Record/002137135431181-4",
        "volumetitle_htrcstring":"Aircraft accident report /",
        "volumeschemaVersion_htrcstring":"https://schemas.hathitrust.org/EF_Schema_MetadataSubSchema_v_3.0",
        "volumepublisherId_htrcstrings":["http://catalogdata.library.illinois.edu/lod/entities/ProvisionActivityAgent/ht/U.S.%20Govt."],
        "volumelastRightsUpdateDate_htrcstring":"20170115",
        "volumepubPlaceName_htrcstrings":["District of Columbia"],
        "volumepubPlaceId_htrcstrings":["http://id.loc.gov/vocabulary/countries/dcu"],
        "volumesourceInstitution_htrcstring":"NWU",
        "volumepublisherType_htrcstrings":["http://id.loc.gov/ontologies/bibframe/Organization"],
        "volumeaccessProfile_htrcstring":"google",
        "_version_":1676440719503392768,
        "volumehtid_htrcstring":"http://hdl.handle.net/2027/ien.35556029988656",
        "volumeaccessRights_htrcstring":"pd"
}





  • No labels