Docket Wrench API

Overview

This documentation describes the API for Docket Wrench, Sunlight Foundation's regulatory comment analysis tool. For more information about Docket Wrench, its data collection methodology, etc., see its About page

The base url for the API is /docketwrench.sunlightfoundation.com/api/1.0 ; any addresses in the below must be appended to the base url. All API calls require an active Sunlight API key as a querystring argument. For instance, a call to describe docket EPA-HQ-OAR-2009-0234 would look like /docket/EPA-HQ-OAR-2009-0234?apikey=[API key]. API methods are described below.

API Methods

Agency

/agency/[agency]

Methods Supported: GET

This endpoint powers Docket Wrench's agency pages. Included are general metadata (name, URL, and ID), as well as stats about the dockets the agency manages, and the documents within those dockets. The stats are, for the most part, structured similarly to those provided by the docket endpoint; note, though, that date aggregation is by month rather than by week because of the longer time scales involved in agency data. Additional information is included about noteworthy dockets for that agency: recent_dockets contains dockets whose first submissions were most recent, and popular_dockets includes dockets that have had the most submissions.

Parameters

agency: a Regulations.gov agency ID, e.g., "FAA" (path parameter; required)

Docket

/docket/[docket_id]

Methods Supported: GET

This endpoint powers Docket Wrench's docket pages. Included are general metadata (title, URL, year, ID, and whether or not it's a rulemaking docket), information about the agency that manages the docket, and stats about the docket's contents. Noteworthy stats subkeys include the total document count (count), a breakdown of documents by type (type_breakdown, broken down into notice, proposed_rule, rule, public_submission, supporting_material, and other), information about the top entities that were recognized as having submitted comments (top_submitter_entities) and that were mentioned (top_text_entities). The doc_info key includes a subkey fr_docs the lists and summarizes all Federal Register documents (notices, proposed rules, and rules) within the docket, with metadata. weeks is a breakdown of submissions by week over the date range of the receipt of submissions (included separately in the date_range key).

Parameters

docket_id: a Regulations.gov ID, e.g. "EPA-HQ-OAR-2009-0234" (path parameter; required)

Document

/document/[document_id]

Methods Supported: GET

This endpoint powers Docket Wrench's document pages. Included are general metadata (type, title, URL, year, and ID), information about the dockey in which the document can be found, and information about agency that manages that docket. Stats are also included about the docket, which powers the docket summary graph on Docket Wrench document pages. Additional information included about the document includes summary information and URLs about each piece of text included with the comment (either a view or an attachment in Regulations.gov parlance), relevant information about submitting or mentioned entities, and details as supplied by Regulations.gov, such as date received, federal register number, etc. Docket Wrench provides these details in two forms, one under the details key which is as provided by Regulations.gov, and the other under the clean_details key, which includes much of the same information cleaned up for display on Docket Wrench: dates are pretty, names are combined, some identifiers are standardized, and information is grouped and ordered.

Additionally, if the document is a federal register document (a rule, proposed rule, or notice), the response includes stats about comments submitted on the document, if any. The format for these stats is similar to that for dockets.

Parameters

document_id: a Regulations.gov document ID, e.g., "EPA-HQ-OAR-2009-0234-20377" (path parameter; required)

Entity

/[type]/[entity_id]

Methods Supported: GET

This endpoint powers Docket Wrench's organization pages. Included are general metadata (type, name, URL, and ID), as well as stats about documents the entity submitted or was mentioned in. This functionality is powered by the database of entities underlying our Influence Explorer project, and relies on a somewhat-lossy text-matching process to identify both submissions and mentions; see the respective methodology pages for Influence Explorer and Docket Wrench for more information.

Stats are divided into text_mentions and submitter_mentions objects, which are structurally similar to one another, and contain information about documents that mention the entity, and documents that the entity likely submitted, respectively. In each is general metadata about top dockets and agencies for each of these document types. For the agencies, breakdowns of submission by month are included to facilitate drawing of graphs of submissions over time.

Parameters

entity_id: an Influence Explorer entity ID, e.g., "d958530f0e2a4979a35af270dfb309a3" (path parameter; required)

type: an Influence Explorer entity type; currently only "organization" is supported. (path parameter; required)

Entity-Docket Overlap

/[entity_type]/[entity_id]/[document_type]_in_docket/[docket_id]

Methods Supported: GET

This endpoint powers allows Influence Explorer to show specific documents from a specific docket that mention or were submitted by a specific entity; this information if included on Influence Explorer organization pages. Metadata is structured similarly to the same information on docket, entity, and document endpoints.

Parameters

entity_id: an Influence Explorer entity ID (path parameter; required)

document_type: either 'mentions' or 'submissions' (path parameter; required)

docket_id: a Regulations.gov docket ID (path parameter; required)

entity_type: the type of entity; currently must be 'organization' (path parameter; required)

Entity Summary

/entity_list

Methods Supported: GET

This endpoint provides a list of the IDs of all organizations that are recognized to have submitted or be mentioned in at least one document. In addition to the usual JSON output, this endpoint can be used with a Content-Accept header of "application/octet-stream" which will return the entity list in a highly compact binary format consisting of just the UUIDs of the entities in question, expressed as pairs of big-endian unsigned longs; see http://stackoverflow.com/questions/6877096/how-to-pack-a-uuid-into-a-struct-in-python for more information about decoding.

Parameters

Document Search Results

/search/document/[query]

/docket/[docket_id]/hierarchy

Methods Supported: GET

Docket Wrench uses hierarchical agglomerative clustering (HAC) to cluster comments on a docket-by-docket basis. The result of this process is a so-called dendrogram in which clusters can be examined in a tree with smaller numbers of loose clusters at the top, dividing into larger numbers of tigher clusters towards the bottom. Docket Wrench includes cluster groups at the 50%, 60%, 70%, 80%, and 90% similarity levels.

This endpoint returns the cluster tree for a given docket. It includes some general information about the docket's clustering behavior in the stats object (its agency, and how many documents were or were not included in the clustering response, for example). The actual cluster tree is in cluster_hierarchy, which is a list of the loosest clusters in the docket. Each cluster is uniquely identified by a combination of the similarity threshold (the cutoff key), and the numerical ID of a canonical document it contains (the name key). Thus, each cluster has a cutoff, name, size (which is the number of documents it contains), phrases, and children. children is another list, of the clusters that result from the splitting of that cluster into subclusters as the similarity threshold is increased, so each will have a higher similarity threshold than its parent. A cluster won't have children if it's already at the highest threshold (90%), or if no two documents it contains are sufficiently similar to still form a cluster at the next threshold of similarity.

This endpoint can also supply distinguishing phrases for each cluster. The process of calculating these phrases is computationally expensive, so by default, phrases are only included if they've already been generated and cached; each cluster's phrases key will be a list of strings if this is true, or null otherwise. Setting the require_summaries GET parameter to true will force computation of phrases if they haven't already been generated. Docket Wrench's usage pattern is to make an initial call to this endpoint without require_summaries, then make a second call with require_summaries if phrases weren't included in the initial response. This allows the application to render other parts of the clustering visualization without waiting for phrases to be computed, which is slower than the initial clustering calculations. Other consuming applications may want to follow this same pattern.

Parameters

docket_id: a Regulations.gov docket ID, e.g., "EPA-HQ-OAR-2009-0234" (path parameter; required)

require_summaries: "true" or "false"; defaults to "false" (query parameter; optional)

Single-cluster Document List

/docket/[docket_id]/cluster/[cluster_id]

Methods Supported: GET

This endpoint supplies a list of the documents within a given cluster; it's used to fill the bottom left pane of the Docket Wrench clustering visualization. The cluster is identified by its representative document ID (name in the full clustering response) and a clustering threshold, supplied via the cutoff GET parameter as a number between 0.5 and 0.9, inclusive.

The response contains a list of documents, ordered by most to least central within the cluster, with the clustering ID of each document, its title, and any submitter text that was included with the original document.

Parameters

docket_id: a Regulations.gov docket ID, e.g., "EPA-HQ-OAR-2009-0234" (path parameter; required)

cluster_id: a numerical representative document ID, e.g., "5123" (path parameter; required)

cutoff: The cutoff for the docket, specified as a number between 0.5 and 0.9, inclusive. (query parameter; optional)

Document with Annotated for Cluster

/docket/[docket_id]/cluster/[cluster_id]/document/[document_id]

Methods Supported: GET

This endpoint returns HTML and metadata for a particular comment within a particular cluster for a particular cutoff within a docket. The HTML is annotated with span tags that assign a background color to phrases within the text, where phrases that are more frequent within that document's cluster at that cutoff level are darker than those that are less frequent. As Docket Wrench's clustering analysis only examines the first 10,000 characters of a document, documents may be truncated; if they are, the truncated key will be set to True.

Parameters

docket_id: a Regulations.gov docket ID, e.g., "EPA-HQ-OAR-2009-0234" (path parameter; required)

cluster_id: a numerical representative document ID, e.g., "5123" (path parameter; required)

document_id: a numerical document ID, e.g., "5123" (path parameter; required)

cutoff: The cutoff for the docket, specified as a number between 0.5 and 0.9, inclusive. (query parameter; optional)

Document Cluster Chain

/docket/[docket_id]/clusters_for_document/[document_id]

Methods Supported: GET

This endpoint allows clients to determine which clusters at which cutoff levels contain a particular document. Documents, dockets, and clusters are identified as with other clustering endpoints.

Parameters

docket_id: a Regulations.gov docket ID, e.g., "EPA-HQ-OAR-2009-0234" (path parameter; required)

document_id: a numerical document ID, e.g., "5123" (path parameter; required)

Clustering Hierarchy Teaser

/docket/[item_id]/hierarchy_teaser

Methods Supported: GET

This endpoint is somewhat similar to the standard docket clustering view, but with less information; it's used to show a teaser of the number of clusters on regular Docket Wrench docket or document pages, and to decide whether or not to include a link to the full clustering display. It only includes cluster counts, and only includes those counts at the 50% and 80% levels.

Information will either be about a document or docket, depending on which is requested in the URL: it will be about clusters containing that document if the URL begins with "/document", otherwise it will cover all documents within the docket. item_id will either be a document ID or a docket ID, accordingly.

Parameters

item_id: a Regulations.gov document or docket ID (path parameter; required)

Clustering Hierarchy Teaser

/document/[item_id]/hierarchy_teaser

Methods Supported: GET

Parameters

item_id: a Regulations.gov document or docket ID (path parameter; required)