Skip to Main Content

Open Government Data & Statistics Research Guide: Getting Started

Formly: Statistics Legal Quick Guide

Getting Started with OGD

What is Open Government Data?

What is OGD?

Open Government Data (OGD) and content can be freely accessed, used, modified, and shared by anyone for any purpose (subject, at most, to requirements that preserve provenance and openness).

Why would you use OGD?

OGD is something you use to help answer a research question, or you can spend time with data sets in order to find a pattern that may aid the development of a research question. The top reasons to use OGD include:

  1. Use to support new legal analysis
  2. Give context on a subject or topic
  3. Use to evaluate conclusions
  4. Develop information services or platforms
  5. Create a combined or composite dataset

How did OGD come about?

In 2009, President Obama mandated open government (Executive Order No. 13642).  Open government seeks to make the workings of the government transparent, accountable, and responsive to citizens. The government collects huge amounts of data, much of which is not confidential. For data that is confidential, data protection legislation works to protect citizens to keep their personal life private.

Definitions & Descriptions

A few terms to get you started. If you need more info check out the link to the left for the Open Data Handbook Glossary.

Openly licensed. Broadly speaking, an open license is one which grants permission to access, re-use and redistribute a work with few or no restrictions. Knowing how data is licensed is vital when combining data sets and/or publishing.

Interoperability. This is the ability to combine various data sets. The combination of data sets is often what produces value, so consider how one set might connect to other sets of data as you research. But beware, there is a lack of consistency or set of standards and data sets may not combine easily.

Metadata. Data is information about something, and metadata describes that data. Metadata can include title, description, method of collection, author, publisher, dates collected, dates of coverage, previous owners of data, license, subject terms, and more. Metadata is essential to discovering and understanding data and making data useable. Learn more about metadata standards using the links to the left.

Machine-readable data. The data must be provided in a form that can be processed by a computer and where the individual elements of the work can be easily accessed and modified. This includes formats like CSV, JSON and XML. All of these formats use standardized metadata, to learn more about this check out the links on the left.

CSV (Comma-separated values) is a format for spreadsheet data using commas instead of the grid structure found in Excel. CSV is more popular than Excel when sharing open data as the program itself is open format. A simple example looks like:

Name,Email,Phone Number,Address
Veronica Mars,vmars@example.com,123-456-7890,123 Fake Street
Keith Mars,kmars@example.com,098-765-4321,321 Fake Avenue

JSON (JavaScript Object Notation) is a popular format for data interchange and can describe complex information structures using a key/value pairing. When working with or reading about JSON you come across related tools like Python or R.

{"name":"Veronica""email":"vmars@example.com"},  
{"name":"Kieth""email":"kmars@example.com"

XML (Extensible Markup Language) can look similar to what you might see with coding for websites (HTML). XML provides a standard for representing structured (or grouped) data. The below is a simple XML example, where the data is given structure through the <terms>:

<note>
<to>Veronica</to>
<from>Kieth</from>
<heading>Reminder</heading>
<body>Don't forget our meeting Tuesday night!</body>
</note>

Subjectivity in open data. You may already understand that what a person values and how they name things creates bias in research; data is subjective too. Data is often “cleaned” so that it is easier to work with. For example, removing accented letters (ã, ë) or grouping content into standardized labels. Cleaning the data helps us search and retrieve results quickly and reliably (making data interoperable), but it can “scrub out” nuances and meanings.

Tools for Analyzing Data

The benefit of interoperable data (check out what this means on the previous tab) is the ability to analyze that data with tools. Below are free tools created by reliable organizations or individuals (meaning the tools will work a year from now, and ideally ten years from now).

Information Visualization

Learn about and get ideas:

Storing Data

Citing Data Sets

YES. You must cite datasets just as you would cite information you found in an article, book, or web page. Below are some resources to help you cite your data.

Examples of data citations include:

  • Bachman, Jerald G., Lloyd D. Johnston, and Patrick M. O'Malley. Monitoring the Future: A Continuing Study of American Youth (12th-Grade Survey), 1998 [Computer file]. Conducted by University of Michigan, Survey Research Center. ICPSR02751-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [producer and distributor], 2006-05-15. http://dx.doi.org/10.3886/ICPSR02751.
  • ASTER Global Digital Elevation Model, version 1, ASTGTM_N11E122_num.tif, ASTGTM_N11E123_num.tif, Ministry of Economy, Trade, and Industry (METI) of Japan and NASA, downloaded from https://wist.echo.nasa.gov/api/, October 27, 2009

From Citing Data Guide from MIT


Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.