3. Data Life Cycle
The amount of environmental data collected worldwide using sensors and satellites during experimentation, observation, simulation and sampling activities is so enormous that it is often described as a "data deluge" 1. It is therefore most important to make sure that data be thoroughly documented in order to facilitate their discovery and their use by search engines and data harvesting/data mining tools.
3.3.1 Controlled Vocabulary
One of the key elements to consider is the use of controlled vocabulary or recognized standard terminology. Adopting an approach that is closest to those internationally recognized is essential as it ensures that data will be identifiable, understood, discovered and accessed by users or by information systems. Consequently, using variable names and data dictionaries commonly used by the international community will prevent confusion. To support the needs of certain communities, servers are often setup by organizations willing to provide users with the appropriate terminology.
Below are a few examples of standard terminology resources:
- GCMD - NASA's Global Change Master Directory (GCMD) 2 is an environmental data reference. GCMD not only provides controlled vocabulary but also offers high quality resources for scientific data discovery, access and sharing.
- ICES - International Council for the Exploration of the Sea (ICES) provides reference codes for oceanographic data, trawling survey and commercial data as well as codes for sampling platforms 3.
- BODC - British Oceanographic Data Center (BODC) of the Natural Environment Research Council (NERC) 4 is a designated data center and offers standard terminology in a wide range of topics. It is part of the SeaDataNet network, a marine European data management infrastructure 5.
- ITIS - Integrated Taxonomic Information System (ITIS) 6 is an integrated information system providing official taxonomic information about plants, animals, fungus and microbes (species names, TSN codes and hierarchical classification).
- WoRMS - World Register of Marine Species (WoRMS) 7 is a registry of official lists of marine organisms and species name synonyms. The Canadian version is the Canadian Register of Marine Species (CaRMS) 8.
- CFMetadata - CF (Climate and Forecast) Metadata 9 is a NASA and Earth Science Data Systems Working Group convention that fosters interoperability between data producers, users and services using clear and non ambiguous standards for the representation of geolocations, time and quantities.
- MMI - Marine Metadata Interoperability (MMI) 10 promotes the exchange, integration and use of marine data and fosters efficient publication, discovery, documentation and accessibility. It offers a semantic framework, vocabulary standards and metadata documentation tools.
- Hey, A.J.G. and A.E. Trefethen. 2003. The Data Deluge: An e-Science Perspective. In, Berman, F., G.C. Fox and A.J.G. Hey (eds.). Grid Computing - Making the Global Infrastructure a Reality. Wiley and Sons, p. 809-824.
- Olsen, L.M., G. Major, K. Shein, J. Scialdone, S. Ritz, T. Stevens, M. Morahan, A. Aleman, R. Vogel, S. Leicester, H. Weir, M. Meaux, S. Grebas, C.Solomon, M. Holland, T. Northcutt, R. A. Restrepo and R. Bilodeau. 2013. NASA/Global Change Master Directory (GCMD) Earth Science Keywords. Version 126.96.36.199.0 .
- International Council for the Exploration of the Sea (ICES). Vocabulary Server.
- Natural Environment Research Council (NERC). Vocabulary Server.
- SeaDataNet - Pan-European Infrastructure for Ocean & Marine Data Management. Common Vocabularies.
- Integrated Taxonomic Information System (ITIS).
- World Register of Marine Species (WoRMS).
- Canadian Register of Marine Species (CaRMS).
- Climate and Forecasts (CF) Metadata.
- Marine Metadata Interoperability (MMI).