A: You may sign up for an account by going to the HTRC Production Portal http://htrc2.pti.indiana.edu and choose "Sign up" from the menu.
Q: How can I generate a list of volumes (such as N randomly selected volumes of non-fiction published in the nineteenth century)?
This will consist of two main steps:
You will need to come up with a list of volumeIDs (HathiTrust's ID strings for individual volumes in the HathiTrust collection) corresponding to those volumes whose full text you want.
Operationalization of Step 1:
My guess is that you will find your needs for this best served by the "metadata core" of the HTRC Solr Proxy API :
As you would be able to see on the above page (to which the link above points): for the "metadata solr core", you can do queries (through the API) that search by various metadata fields such as (most importantly for your needs in this instance): 'genre' and 'publishDate'. The latter can be used as a 'range' field — as the doc says, if you specify the following in the query you make at the API:
publishDate : [1990 TO 1999]
You can go by genre to decide whether a volume counts as non-fiction, but a problem here may be that 'genre' is often inaccurate or missing.
An alternative way to go about this, in case you are lucky enough to have data to tell you what you don't want (i.e. you simply don't want volumes that happen to be in a 'fiction' dataset), would be the following:
So, once you've reached this point, you will have your list of volumeIDs ready.
Operationalization of Step 2:
Then, you can submit those volumeIDs to HathiTrust, requesting from HathiTrust a "custom dataset" consisting of the content of just the volumes corresponding to those volumeIDs. (The section "Custom Datasets" at the page http://www.hathitrust.org/datasets spells out the procedure for making that request to HathiTrust.)
This step is slightly bureaucratic because your list of volumeIDs will almost invariably contain volumeIDs that correspond to volumes that were digitized by Google — which would necessitate that you sign a couple of statements and submit them to HathiTrust before you can receive your "custom dataset".
At this point, you would be done.
Q: How do I access HTRC Production Stack?