Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A correct request returns a 200_OK status, and the body of the response is a binary ZIP stream with the MIME type of “application/zip”.

If the request was sent with concat=false, the returned ZIP file would have the following structure:

pages.zip
   <cleaned_volumeID_1>/
       0000000n.txt
       000000xx.txt
       …
   <cleaned_volumeID_2>/
       0000000p.txt
       0000wwww.txt
       …
   …
   <cleaned_volumeID_n>/
       0000qqqq.txt
       000zzzzz.txt
       …
   ERROR.err

Each volume has its own directory, with the requested pages as individual text files under the directory.  The name of the directory is the cleaned volumeID, and the name of each page text file is the eight-digit fixed-length zero-padded page sequence with .txt extension (see Section 1.2.4 for details on the cleaned volumeIDs).

If the Data API encounters a problem during the retrieval, it terminates the ZIP stream to prevent corruption, but not before it injects an ERROR.err file into the stream which contains more information on the error.  If ERROR.err is present, the user can assume the ZIP stream does not contain all the resources requested.

If the request was sent with concat=true, the returned ZIP stream has the following structure:

pages.zip
   wordbag.txt
   ERROR.err

in this case, all requested pages are concatenated into a single text file regardless of which volume a page is from.  It essentially turns into a “bag of words”, thus the filename wordbag.txt

If there is an error, the Data API terminates the ZIP stream and injects ERROR.err with information on the problem encountered.  Should ERROR.err be present, the received ZIP stream is missing some requested resources.

1.2.6 Errors

The Data API detects error conditions as early as possible, and returns the corresponding error status and message to the client if an error is detected before it has committed to send the ZIP stream.internally retrieves volume and page data asynchronously from the back-end data store for maximum performance.  If an error occurs before any internal retrieval requests are sent to the back-end data store (e.g. a malformed volumeID or pageID in the client request), the Data API returns an error HTTP response status to the client, an a short description of the error as the response body, and no volume or page data is be returned.  However, if an error occurs after the internal retrieval requests are sent (e.g. a non-existent volumeID or pageID), the Data API returns a 200_OK HTTP response status and a ZIP stream containing volume and page data as the response body.  The ZIP stream will also contain an ERROR.err entry with details on the error.  If multiple errors occurred, only the first error is returned – in fact, the Data API hold the error until all internal asynchronous requests are finished (either successfully or failed on other errors) and then injects ERROR.err as the last entry to provide maximum tolerance and flexibility to its client.

The table below lists the possible error status and response body a client may receive, as well as the meaning of the error.

Response Status

...

Response Body

...

Reason

400 Bad

...

Request

<p>Missing required parameter volumeIDs</

...

p>request for volumes does not have volumeIDs query parameter
400 Bad

...

Request<p>Missing required parameter pageIDs</

...

p>request for pages does not have pageIDs query parameter
400 Bad

...

Request<p>Malformed volume ID list. Offending token: xxx</

...

p>volumeID list contain tokens that are not valid volumeIDs and cannot be parsed
400 Bad

...

Request<p>Malformed page ID list. Offending token: xxx</

...

p>pageID list contains tokens that are not valid pageIDs and cannot be parsed
400 Bad

...

Request<p>Request too greedy. Request violates Max Volumes Allowed xxx. Offending ID: zzz</

...

p>request would touch and retrieve more volumes than allowed by the policy.  In the response body, xxx is the limit set by the policy, and zzz is the first volumeID that exceeds the limit.  Applicable if the policy is set on the server.
400 Bad

...

Request<p>Request too greedy. Request violates Max Total Pages Allowed xxx. Offending ID: zzz</

...

p>request would touch and retrieve more pages than allowed by the policy. In the response body, xxx is the limit set by the policy, and zzz is the first pageID that exceeds the limit. Applicable if the policy is set on the server.
400 Bad

...

Request<p>Request too greedy. Request violates Max Pages Per Volume Allowed xxx. Offending ID: zzz</

...

p>request would touch and retrieve more pages per volume than allowed by the policy. In the response body, xxx is the limit set by the policy, and zzz is the first ID that exceeds the limit. Applicable if the policy is set on the server
404 Not

...

Found<p>Key not found. Offending key: xxx</

...

p>request asks for a non-existent volumeID, or

...

asks for a non-existent page sequence of a valid volume (e.g. asking for page 100 of a volume with only 90 pages)

...

500 Internal Server

...

Error<p>Server too busy.</

...

p>Data API gets a timed out exception while trying to retrieve data.
500 Internal Server

...

Error<p>Internal server error.</

...

p>other exceptions occurred when Data API tries to retrieve data

...

.