Open Government Data (OGD) and content can be freely accessed, used, modified, and shared by anyone for any purpose (subject, at most, to requirements that preserve provenance and openness).
OGD is something you use to help answer a research question, or you can spend time with data sets in order to find a pattern that may aid the development of a research question. The top reasons to use OGD include:
In 2009, President Obama mandated open government (Executive Order No. 13642). Open government seeks to make the workings of the government transparent, accountable, and responsive to citizens. The government collects huge amounts of data, much of which is not confidential. For data that is confidential, data protection legislation works to protect citizens to keep their personal life private.
A few terms to get you started. If you need more info check out the link to the left for the Open Data Handbook Glossary.
Openly licensed. Broadly speaking, an open license is one which grants permission to access, re-use and redistribute a work with few or no restrictions. Knowing how data is licensed is vital when combining data sets and/or publishing.
Interoperability. This is the ability to combine various data sets. The combination of data sets is often what produces value, so consider how one set might connect to other sets of data as you research. But beware, there is a lack of consistency or set of standards and data sets may not combine easily.
Metadata. Data is information about something, and metadata describes that data. Metadata can include title, description, method of collection, author, publisher, dates collected, dates of coverage, previous owners of data, license, subject terms, and more. Metadata is essential to discovering and understanding data and making data useable. Learn more about metadata standards using the links to the left.
Machine-readable data. The data must be provided in a form that can be processed by a computer and where the individual elements of the work can be easily accessed and modified. This includes formats like CSV, JSON and XML. All of these formats use standardized metadata, to learn more about this check out the links on the left.
CSV (Comma-separated values) is a format for spreadsheet data using commas instead of the grid structure found in Excel. CSV is more popular than Excel when sharing open data as the program itself is open format. A simple example looks like:
Veronica Mars,email@example.com,123-456-7890,123 Fake Street
Keith Mars,firstname.lastname@example.org,098-765-4321,321 Fake Avenue
XML (Extensible Markup Language) can look similar to what you might see with coding for websites (HTML). XML provides a standard for representing structured (or grouped) data. The below is a simple XML example, where the data is given structure through the <terms>:
<body>Don't forget our meeting Tuesday night!</body>
Subjectivity in open data. You may already understand that what a person values and how they name things creates bias in research; data is subjective too. Data is often “cleaned” so that it is easier to work with. For example, removing accented letters (ã, ë) or grouping content into standardized labels. Cleaning the data helps us search and retrieve results quickly and reliably (making data interoperable), but it can “scrub out” nuances and meanings.
The benefit of interoperable data (check out what this means on the previous tab) is the ability to analyze that data with tools. Below are free tools created by reliable organizations or individuals (meaning the tools will work a year from now, and ideally ten years from now).
Learn about and get ideas:
YES. You must cite datasets just as you would cite information you found in an article, book, or web page. Below are some resources to help you cite your data.
Examples of data citations include:
From Citing Data Guide from MIT