Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The EF dataset for any HTRC workset can be retrieved as follows. A user first creates a workset (or choose an existing workset) from the HTRC Portal. The EF datasets for the workset are transferred via rsync, a robust file synchronization/transfer utility. The user executes the EF rsync script generator algorithm (available as one of the algorithms provided at the HTRC Portal) with that workset. This produces a script that the user can then download and execute on his/her own machine. When executed on the user’s machine, the script transfers the EF data files for that workset from the HTRC’s server to the user’s hard disk, resulting, for each volume in the selected workset, in two zipped files containing “basic” and “advanced” EF data. The EF data is in JSON (JavaScript Object Notation) format — a commonly used lightweight data interchange format.

...

{  "id":"loc.ark:/13960/t1fj34w02",

      "metadata":{

      "schemaVersion":"1.2",

...

                "I.":{"NN":1},

 

                "THE":{"DT":1},

 

                "INTRODUCTION":{"NN":1},

 

                "DRAMA":{"NNPS":1}, 

                "SHAKESPEARE":{"NNP":1},

 

                "ENGLISH":{"NNP":1},

 

                "AND":{"CC":1}}},

 

          "body":{

 

             "tokenCount":205,

 

             "lineCount":35, 

             "emptyLineCount":9,

 

             "sentenceCount":6, 

             "tokenPosCount":{ 

                "striking":{"JJ":1},

 

                "his":{"PRP$":1},

 

                                 "plays":{"NNS":1},

 

                "London":{"NNP":1},

 

                "four":{"CD":1},

 

                ".":{".":7},

 

                "dramatic":{"JJ":2},

 

                "1576":{"CD":1}, 

                "stands":{"VBZ":1},

 

                                ... 

                "growth":{"NN":1}

 

             } 

          },

 

          "footer":{ 

             "tokenCount":0,

 

             "lineCount":0, 

             "emptyLineCount":0,

 

                    "sentenceCount":0, 

                    "tokenPosCount":{}}}]}}

...