HTRC Release 2.0

News:  Release 2.0, December 2013:  contains numerous enhancements and fixes.  These are listed in the Changes Since Last Release column.  Those that are user-facing feature enhancements are highlighted in red.  

Software Service

Software Service FunctionalityChanges Since Last ReleaseChanges Directly Affecting Users?
HTRC-App

A client to retrieve data from Data API. Mainly used by Meandre workflow.

  • Use HTTP POST instead of HTTP GET
x
HTRC-Compute-AgentJob submission and monitoring.

New features

  • submission of jobs to PBS-based clusters such as Quarry and BigRed2
  • support for user-submitted CSV files

Technical updates and bug fixes

  • akka configuration settings especially those related to performance, e.g. thread pool size
  • error handling for registry queries
  • specification of paths in configuration files rather than in the code

 

x

x

 

x

x

x

HTRC-Data-DataAPI
  • Retrieve volume and page contents
  • Return token count for volumes/pages on the fly

New features

  • Return token count for volumes/pages. The computation is done on the server side.

 

 

HTRC-Data-DeleteNoticeToolinternal tool for deleting volumes listed in the weekly deletion notice emails sent from the HathiTrustNonex
HTRC-Data-Ingesterinternal service that brings the HathiTrust corpus into the HathiTrust Research CenterNonex
HTRC-Data-LogIngesterinternal tool that collects log information for agent, portal, data api and solr proxy
  • added support for agent log
x
HTRC-Data-RegistryExtension

Provides the backend storage service/retrieval functionality for HTRC

  • worksets
  • jobs
  • algorithms
  • files
  • updated workset metadata to include whether the workset is public or not
  • added .csv filetype registration for CXF servlet
  • updated schema to reflect changes to properties XML
  • reworked CSV workset export functionality

  • added "volumeCount" workset metadata field

  • Added support for extended workset properties for workset creation and retrieve workset volumes operation.

x

x

x

x

x

x

HTRC-Data-SolrProxya proxy service between users and real Solr cluster to protect index from being modified and audit user requests
  • handles errors by returning xml error messages
  • uses 2 different cores, one for metadata and the other for OCR

x

x

HTRC-Meandre-Components

Meandre components that use the client in HTRC-App to connect to the data api to retrieve the data.

  • switched to using Maven
  • cleaned up dead code
  • removed plain-text password authentication in favor of token-based authentication
  • updated to use latest version of DataAPI
  • bug fix: ensured proper close of client connections

x

x

x

x

x

HTRC-Meandre-FlowsMeandre flows that provide the algorithmic functionality, like token counts, topic modeling, entity extraction and dunning log likelihood analysis.
  • added flows to version control
  • fixed Dunning Log-likelihood to output both over- and under- represented data and provide those outputs in a downloadable format
  • added Naive-Bayes classification algorithm and provide training and testing confusion matrices along with actual and predictive values.

x

HTRC-Security-Auditora utility package meant to be used by other components for generating audit logs in a more consistent formatNonex
HTRC-Security-OAuth2

OAuth2 Authentication Related Components:

  • OAuth2 filter to use with web applications
  • WSO2 IS customizations
  • OAuth2 User Information Service for Reverse Lookup Using OAuth2 Access Token
  • OAuth2 Client API for Java Applications

 

Nonex
HTRC-Tools-BackupAndRestoreCommand line tool used to backup user accounts and registry contents of HTRC stacks.This is a new component for this release.x
HTRC-Tools-UserManagerCommand line tool used to perform user management actions for HTRC (user creation, password changes, etc.)Nonex
HTRC-UI-AuditAnalyzera GUI for visualize stats info for logs collected by HTRC-Data-LogIngesteradded agent log analysisx
HTRC-UI-Blacklight

Workset Builder

  • query documents by catalog and full text contents
  • filter results by facet
  • manually select and deselect documents
  • save named worksets for future reference and use in Portal

bug fixes:

  • better workflow and messages when unauthenticated user tries to create a workset
  • fixed loading gif display

tech changes:

  • point to new Solr index
  • point to updated Registry
  • add asset_host so assets can be built automatically
  • improved code branching per environment (extended use of htrc.yml)

feature adds:

  • display custom htrc metadata fields on volume detail page
  • add a sign up link in the header
  • add a link to the HT page turner on the volume detail page
  • remove non-functioning email and SMS links on volume detail page

 

x


x

x

x

x

HTRC-UI-Portal2
  • Browse worksets
  • Upload CSV worksets
  • Browse Algorithms
  • Execute Algorithms
  • Browse algorithm execution results
  • Create and manage user accounts

New features

  • UI improvements
  • Password reset function
  • Display list of volumes of each workset
  • Display htrc metadata and HT page turner on each volume of workset