Skip to main content

Metadata

Metadata is information about the context, quality, provenance, and/or accessibility of a set of data. In order for your data to be accessible to you, your colleagues, and other researchers, it must be properly documented. Put Simply:

Metadata is:

  • Frequently required for depositing a data set in disciplinary repositories, or for publishing in a research journal;
  • Necessary for the longevity and reproducibility of research data;
  • Useful for analyzing the data in data files.

Common examples you see every day include:

  • Nutritional information on a package of food
  • Information you enter about yourself on a social networking site such as LinkedIn
  • Information on the IMDB database about the movie Apollo 13 (Director, Writers, Starts. etc.), or any other movie, TV show. etc.

Examples of Metadata

Various disciplines use different metadata schemas. Putting your data into a standardized metadata format improves interoperability. Dublin Core is a general schema that is widely used and not discipline specific. Digital Commons is based on Dublin Core.

Other schema frequently used include:

Ecological Metadata Language (EML)

EML is used by repositories such as Long Term Ecological Research Network and the KNB Knowledge Network for Biocomplexity. Morpho is a tool used to create and edit metadata in EML.  

Darwin Core

CF (Climate and Data)

CF (Climate and Forecast) is used by repositories such as NCAR - National Center for Atmospheric Research. The CF conventions are intended to promote the processing and sharing of files created with the  NetCDF -- Unidata's Network Common Data Form.

DDI - Data Documentation Initiative (social sciences)

ISO 19115 - Geographic Information

ISO 19115 is a metadata standard used to describe geographic information and services.  It was developed to harmonize the various geographic metadata standards, such as FGDC, that had been developed.


Many more exist! Check out the list at the Digital Curation Centre to learn more:


Beyond Structured Metadata: Other Elements of Documentation

README Files

A README file can be very useful in understanding your data. They can be used in many ways to help you manage your data. Kristin Briney's Data Ab Initio blog post on README.txt files provides a brief introduction to effectively creating and using these to manage your data.  For examples of files, check out this entry in DataQ which provides links to several README files.

If you plan to deposit your data into USU's Digital Commons, a README file is required. A README File template is available to assist you, or you use of of your own design or modify one listed below.

Other good resources include:

Data Dictionaries:

When a dataset has many variables that require explanation, a data dictionary should be provided. Spreadsheet data should have short variable names at the top of each column.  Often it is difficult to decipher what these mean.  Create a data dictionary that lists the variable names, their meanings and units, coding values, any known issues with the data (missing values, any known errors), identify your null value, and other information that would help someone in the future make sense of your data set.

Resources for Data Dictionaries:

Codebooks

Codebooks are often used with survey data, and can serve the purpose of a data dictionary.  In addition to identifying the variables and their meaning in a data set, a code book describes information such as the sampling method, includes the text of the questions, and information on the number or responses to each question.

Resources for Code Books:

Need Help?

Contact us if you have questions or need help.

Email Research Data