Definitions

Download PDF

To organizations, research groups or project teams, data management means the development and implementation of the entire data life cycle architecture, processes and practices.The current document provides a general overview of basic scientific data management concepts in order to help readers better understand technical and often complex notions, starting with the following definitions:

Data

A dataset is a collection of structured data (often in table format) where the fields (columns) correspond to the different variables and the lines display the different values for those variables. Several file formats are used including structured formats (Ex. CSV – Coma Separated Values), geospatial (Ex. GeoTIFF) and XML (eXtended Markup Language) used for metadata.1

Dataset

A dataset is a collection of structured data (often in table format) where the fields (columns) correspond to the different variables and the lines display the different values for those variables. Several file formats are used including structured formats (Ex. CSV – Coma Separated Values), geospatial (Ex. GeoTIFF) and XML (eXtended Markup Language) used for metadata.

Data Service

A data service makes data (including text, image, video, and audio) available via Internet. Ex.: RSS feed.

Georeferenced Data/Geospatial Data/Geodata

Geospatial/georeferenced data include geographical locations such as X-Y [Latitude-Longitude] coordinates or, at least, a reference to a site from which positions can be calculated. Geodata often includes a vertical component Z [depth/altitude].

GEOSS SOCIETAL BENEFITS:

disasters, health, energy, climate, agriculture, ecosystems, biodiversity, water and weather

GEOSS

The Global Earth Observing System of Systems is an international group of organizations combining their expertise across nine topics or "societal benefits". GEOSS contributes to the monitoring, analysis and accessibility of data in these areas of interest.2

Harvesting

The process of collecting metadata descriptions from different sources/registries to facilitate data discovery.


GEOSS SOCIETAL BENEFITS:
disasters, health, energy, climate, agriculture, ecosystems, biodiversity, water and weather

Information

In general, the difference between "data" and "information" lies in the fact that data refers to raw observations acquired from research or monitoring activities while information is obtained by processing and/or interpreting data.3

Interoperability

Interoperability is the capability of a product or a computer system to function with other existing products or systems without restrictions and independently of their own physical architecture and operating systems. Interoperability can be achieved through the use of Internet open standards. The mission of the World Wide Web Consortium – W3C4 is to provide guidance and to contribute to the Web evolution by developing protocoles, standards and guidelines supporting interoperability.

ISO

The International Organization for Standardization (ISO) is the world's largest developer of voluntary international standards in a variety of areas ranging from currency codes, to water meters requirements, to date and time representation. ISO standards give world-class specifications for products, services and systems, to ensure quality, safety and efficiency.5

Open Data

Governments are increasingly adopting an "open" approach in order to improve accessibility of publicly funded data and information.6 The same concept is also guiding non-governmental initiatives aiming at fostering transparency, accountability and reuse of data. Gartner, a world leader in information technology research, defines open data as "information or content made freely available to use and redistribute, subject only to the requirement to attribute it to the source"7. Non-proprietary open data formats allow producers to save data in a way that lets users access data without having to buy any specific software (or a particular version of software). E.g.: text files with the .ODF extention (Open Document Format).8

A world without metadata…

…would look like a video club without any thematic sections where hundreds of DVD boxes would be displayed without titles or covers and where even the disks would not be labelled.

How could anyone find any particular movie?
How would it be possible to know what is available whithout having to open each box and play each DVD?

How would it be possible to find environmental data about the St. Lawrence if data producers did not document their datasets and did not publish the existence of their data in order to make them discoverable?

Metadata

Data that describes other data. The reference for metadata is the ISO 19115 international standard.

Standard

Document that defines the specifications, characteristics, guidelines or requirements to ensure the consistent use of products, processes and services.

Registry/Catalogue

Catalogue services enable the publication and search of descriptive information (metadata) about data and data services. They can also harvest metadata from other catalogues9. The Open Geospatial Consortium (OGC) differentiates "catalogue" and "registry" by stating that a registry is a specialized catalogue that is maintained by an official entity in compliance with access procedures and policies and content management (ISO 19135, ISO 11179-6 standards).

  1. National Science Board. 2005. Long Lived Digital Data Collections: Enabling Research and Education in the 21st Century. 92 p. http://www.nsf.gov/geo/geo-data-policies/nsb-0540-1.pdf
  2. Fontaine, K.S. 2007. Architecture and Data Management Challenges in GEOSS and IEOS. 10 p. http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20070017998.pdf
  3. International Oceanographic Data and Information Exchange (IODE) - Marine Data Management. http://www.iode.org/index.php?option=com_content&view=article&id=3&Itemid=33
  4. World Wide Web Consortium (W3C). http://www.w3.org/
  5. International Organization for Standardization (ISO). http://www.iso.org/iso/home/about.htm
  6. Gouvernement of Canada – Open Data. http://open.canada.ca/en/open-data
  7. Gartner – Open Data. http://www.gartner.com/it-glossary/open-data
  8. ISO. 2006. OpenDocument OASIS standard for data interoperability of office applications. http://www.iso.org/iso/home/news_index/news_archive/news.htm?refid=Ref1004
  9. Open Geospatial Consortium (OGC).2014. OGC I15 (ISO19115 Metadata) Extension Package of CS-W ebRIM Profile 1.0. 136 p. http://www.opengis.net/doc/ISx/csw-ebrim-i15/1.0