To organizations, research groups or project teams, data management means the development and implementation of the entire data life cycle architecture, processes and practices.
The current document provides a general overview of basic scientific data management concepts in order to help readers better understand technical and often complex notions, starting with the following definitions:
A dataset is a collection of structured data (often in table format) where the fields (columns) correspond to the different variables and the lines display the different values for those variables. Several file formats are used including structured formats (Ex. CSV – Coma Separated Values), geospatial (Ex. GeoTIFF) and XML (eXtended Markup Language) used for metadata. 1
A dataset is a collection of structured data (often in table format) where the fields (columns) correspond to the different variables and the lines display the different values for those variables. Several file formats are used including structured formats (Ex. CSV – Coma Separated Values), geospatial (Ex. GeoTIFF) and XML (eXtended Markup Language) used for metadata.
A data service makes data (including text, image, video, and audio) available via Internet. Ex.: RSS feed.
Geospatial/georeferenced data include geographical locations such as X-Y [Latitude-Longitude] coordinates or, at least, a reference to a site from which positions can be calculated. Geodata often includes a vertical component Z [depth/altitude].
The Global Earth Observing System of Systems is an international group of organizations combining their expertise across nine topics or "societal benefits". GEOSS contributes to the monitoring, analysis and accessibility of data in these areas of interest. 2
The process of collecting metadata descriptions from different sources/registries to facilitate data discovery.
In general, the difference between "data" and "information" lies in the fact that data refers to raw observations acquired from research or monitoring activities while information is obtained by processing and/or interpreting data. 3
Interoperability is the capability of a product or a computer system to function with other existing products or systems without restrictions and independently of their own physical architecture and operating systems. Interoperability can be achieved through the use of Internet open standards. The mission of the World Wide Web Consortium – W3C 4 is to provide guidance and to contribute to the Web evolution by developing protocoles, standards and guidelines supporting interoperability.
The International Organization for Standardization (ISO) is the world's largest developer of voluntary international standards in a variety of areas ranging from currency codes, to water meters requirements, to date and time representation. ISO standards give world-class specifications for products, services and systems, to ensure quality, safety and efficiency. 5
Governments are increasingly adopting an "open" approach in order to improve accessibility of publicly funded data and information. 6 The same concept is also guiding non-governmental initiatives aiming at fostering transparency, accountability and reuse of data.
Gartner, a world leader in information technology research, defines open data as "information or content made freely available to use and redistribute, subject only to the requirement to attribute it to the source" 7. Non-proprietary open data formats allow producers to save data in a way that lets users access data without having to buy any specific software (or a particular version of software). E.g.: text files with the .ODF extention (Open Document Format). 8
Data that describes other data. The reference for metadata is the ISO 19115 international standard.
Document that defines the specifications, characteristics, guidelines or requirements to ensure the consistent use of products, processes and services.
Catalogue services enable the publication and search of descriptive information (metadata) about data and data services. They can also harvest metadata from other catalogues 9. The Open Geospatial Consortium (OGC) differentiates "catalogue" and "registry" by stating that a registry is a specialized catalogue that is maintained by an official entity in compliance with access procedures and policies and content management (ISO 19135, ISO 11179-6 standards).