Skip to Main Content

Data Management Planning: Metadata

Metadata

Metadata is information about the context, content, quality, provenance, and/or accessibility of data. It is the critical information for ensuring the longevity and reproducibility of research data.

Metadata can exist in a variety of different formats. Examples are listed below:

  • Resource Discovery is locating, accessing, and retrieving varied and distributed data.
  • Resource Description is the differentiating of information and describing the characteristics of a resource.
  • Rights Management is the authorized access, display, and use permissions for objects to protect intellectual property rights, confidentiality, privacy, and security of information.
  • Preservation of data is the storing of the technical details on the format, structure and use of the digital content, history of all actions performed on the resource including changes and decisions, authenticity information such as technical features or custody history, and responsibilities and rights information applicable to preservation actions.
  • The Structural aspects of data facilitates direct access to key points in complex objects to aid the navigation and access to different parts of the same data set or data object.
  • Administration of the metadata captures the data's location, integrity, ownership, and authorship.

Metadata can exist in a variety of different formats. Some of the most common ones are summarized below:

Discipline Definition Example
Biodiversity The Darwin Core (DwC) is a standard designed to facilitate the exchange of information the geographic occurrence of species and the existence of specimens in collections. Example
Geospatial Geospatial metadata commonly document geographic digital data such as Geographic Information System (GIS) files, geospatial databases, and earth imagery but can also be used to document geospatial resources including data catalogs, mapping applications, data models and related websites. Example
Social Science Data Documentation Initiative (DDI) is an effort to create an international standard for describing data from the social, behavioral, and economic sciences. Expressed in XML, the DDI metadata specification now supports the entire research data life cycle. Example

 

If you are uncertain of what metadata standards may be in use in your discipline, the Digital Curation Centre maintains a list of commonly-used metadata standards organized by discipline. If you intend to deposit your data in a data repository, this repository may have guidelines on what metadata standard(s) should be used to describe deposited data.

 

Controlled Vocabularies

Controlled vocabularies are a collection of preferred terms that are used to retrieve content consistently. Predefined and authorized terms are mandated, in contrast to tags or keywords, which are not controlled, thus ambiguous and inconsistent. Taxonomies, thesauri, and ontologies are types of controlled vocabulary.

  • They facilitate searching and meta-analysis within a data set
  • They enhance the interoperability of data sets in repositories with data from multiple sources

How are controlled vocabularies used?

  • They are used in metadata records to express how content is organized so users know how to search for content
  • A more complex scenario would be using a published controlled vocabulary as a schema for your database. This could make it easier to deposit your data into a disciplinary repository that is based on the same vocabulary

Which vocabularies should I use?

In some fields, vocabularies are well-established, in other disciplines, they are are emerging. You may want to check professional societies and journals for ones that have been developed in your disciplinary area. The list below is a starting point. 

Disciplinary area Example
Life Science Bioportal biomedical vocabularies from the NIH National Centers for Biomedical Computing
Geospatial Geographic Names Information System (GNIS) Developed by the USGS in cooperation with the U.S. Board on Geographic Names, contains information about physical and cultural geographic features in the United States and associated areas, both current and historical (not including roads and highways). The database holds the Federally recognized name of each feature and defines the location of the feature by state, county, USGS topographic map, and geographic coordinates.
Medical Medical Subject Headings (MeSH) is a controlled vocabulary for the purpose of indexing journal articles and books in the life sciences; created and updated by the US National Library of Medicine.
Agriculture The agricultural thesaurus online vocabulary tools of agricultural terms in English and Spanish that cooperatively produced by the National Agricultural Library, USDA, and the Inter-American Institute for Cooperation on Agriculture.
Biodiversity Biocomplexity Thesaurus displays terminologies and term relationships in the fields of biology, ecology, environmental sciences, and sustainability.
Humanities Music Ontology provides main concepts and properties for describing music (artists, albums, tracks, arrangements).

 

Attribution

Parts of this guide were borrowed and/or adapted from resources from the Universit of Wisconsin-Madison and the University of Nebraska-Lincoln. Thanks for sharing!