Metadata is information about the context, quality, provenance, and/or accessibility of a set of data. In order for your data to be accessible to you, your colleagues, and other researchers, it must be properly documented. Put Simply:
- Frequently required for depositing a data set in disciplinary repositories, or for publishing in a research journal;
- Necessary for the longevity and reproducibility of research data;
- Useful for analyzing the data in data files.
Common examples you see every day include:
- Nutritional information on a package of food
- Information you enter about yourself on a social networking site such as LinkedIn
- Information on the IMDB database about the movie Apollo 13 (Director, Writers, Starts. etc.), or any other movie, TV show. etc.
Examples of Metadata
Various disciplines use different metadata schemas. Putting your data into a standardized metadata format improves interoperability. Dublin Core is a general schema that is widely used and not discipline specific. Digital Commons is based on Dublin Core.
Other schema frequently used include:
Ecological Metadata Language (EML)
CF (Climate and Data)
DDI - Data Documentation Initiative (social sciences)
ISO 19115 - Geographic Information
Many more exist! Check out the list at the Digital Curation Centre to learn more:
- DataONE Best Practice - Identify and use relevant metadata standards
- Getting Started With DDI 3.1
- Digital Curation Centre - Metadata Standards by Discipline
Beyond Structured Metadata: Other Elements of Documentation
A README file can be very useful in understanding your data. They can be used in many ways to help you manage your data. Kristin Briney's Data Ab Initio blog post on README.txt files provides a brief introduction to effectively creating and using these to manage your data. For examples of files, check out this entry in DataQ which provides links to several README files.
If you plan to deposit your data into USU's Digital Commons, a README file is required. A README File template is available to assist you, or you use of of your own design or modify one listed below.
Other good resources include:
- Cornell University Research Data Management Group - Guide to writing "readme" style metadata
When a dataset has many variables that require explanation, a data dictionary should be provided. Spreadsheet data should have short variable names at the top of each column. Often it is difficult to decipher what these mean. Create a data dictionary that lists the variable names, their meanings and units, coding values, any known issues with the data (missing values, any known errors), identify your null value, and other information that would help someone in the future make sense of your data set.
Resources for Data Dictionaries:
- Kristen Briney has a very informative post about data dictionaries with an example.
- Other resources include DataONE: Create a data dictionary
Codebooks are often used with survey data, and can serve the purpose of a data dictionary. In addition to identifying the variables and their meaning in a data set, a code book describes information such as the sampling method, includes the text of the questions, and information on the number or responses to each question.
Resources for Code Books:
Contact us if you have questions or need help.