IPY logo
IPY Panorama image
DOKIPY data management /Data submission June 4. 2010  

Data submission

  1. File format
    1. File generation
    2. File structure
  2. Mandatory metadata

Introduction

The DOKIPY service operated at METNO offers some automated services that makes life easier for scientists submitting data. When formatting data according to the specification below, files will be received, checked, any errors reported and then automatically parsed and handled by the system. Files in this format may be made available online according to the distribution ststement specified within the file in order to fulfil the IPY Data Policy. The address of the file upload service is given at the bottom of the document.

Preferred file format

Wherever possible netCDF and the CF metadata standard will be used. the main reason for this is that metadata are extracted automatically from the uploaded data files. For further information on netCDF and CF check out:

For convenience data can also be uploaded in CDL. This is an ASCII format that can be converted to netCDF. Some CDL templates are available within the DAMOCLES User Forum (which is open to everyone), more will be added in the future.

How to generate files

Binary NetCDF files may be created using software available from UNIDATA or by using existing implementations within software applications e.g. MATLAB or R. Another option is to generate an ASCII version of the NetCDF file called CDL which easily can be converted to NetCDF. CDL files can be uploaded directly and will be converted by the data management system. More information on how to use or format a CDL file can be found in the DAMOCLES User Forum. It is important that the Climate and Forecast standard CF 1.x is being used as well as adding some extra metadata elements that ensures proper management and access to the data.

File structure

The files you upload to the database should have the general characteristics (for in situ measurements):

  • should contain a time dimension
  • could contain a station or case dimension
  • should contain a time variable (with time dimension) using a reference time (e.g. 1 January 1970)
  • should contain a latitude variable (with station or case dimension)
  • should contain a longitude variable (with station or case dimension)
  • should contain one or more variables reflecting the measurements (with station/case and time dimension) and using units defined by UDUNITS wherever possible. Do not use the units attribute for the variables if UDUNITS can not be used, in this case use variable long names in stead
  • should contain the metadata elements specified below as global attributes

Metadata

Required elements

In addition to the requirements implied by CF, these metadata elements aims at satisfying the ISO 19115 standard for geographical metadata and the WMO profile based on this standard.

In netCDF files, the metadata elements are implemented as global attributes.

The metadata elements to be used differ according to the type of data (gridded or in situ) to be handled. However, some elements are mandatory anyway. See FGDC Metadata Quick Guide for best practises.

Mandatory metadata elements for DOKIPY datasets (processing of global attributes names are case sensitive)
NamePurpose
titleA short description of the data set
abstractA short summary of the data collection activity and data set. This element may alternatively be provided as the global attribute "comment" in a netCDF file.
topiccategoryA blank separated list of topic keywords describing the dataset. See below for applicable keywords.
keywordsA blank separated list of keywords describing the dataset. See below for list of applicable keywords.
gcmd_keywordsNewline separated list of GCMD scientific keywords describing the various variables. This will be used to categorize the datasets according to the "Topics and variables" menu selection in the metadata search facility. If proper standard names have been used for the variables, data will be mapped for search under Topics and variables even without the gcmd_keywords attribute. For more information see below.
activity_typeComma separated list of activity types. See list below for applicable descriptions.
operational_status Status of the dataset, being one of Experimental, Pre operational, Operational or Scientific. See explanation.
ConventionsThe metadata convention used, should be "CF-1.x" where x=1, 2, or 3.
product_nameA product name of the dataset.
historyModification history of the dataset. Should be of the form:
2007-05-12 creation
2007-06-10 revision and separated by newlines.
areaArea name describing the geographical area being studied. If several area names are used, separate them using comma. See below
southernmost_latitudeElements to describe a geographical bounding box for the data. Should be a floating point value (decimal degrees).
northernmost_latitudeElements to describe a geographical bounding box for the data. Should be a floating point value (decimal degrees).
westernmost_longitudeElements to describe a geographical bounding box for the data. Should be a floating point value (decimal degrees).
easternmost_longitudeElements to describe a geographical bounding box for the data. Should be a floating point value (decimal degrees).
start_dateStart date and time of the dataset in the form "2007-06-12 12:30:00 UTC"
stop_dateStop date and time of the dataset in the form "2007-06-12 12:30:00 UTC"
institutionName of the institution responsible for the dataset. Please use one of the standardised names below (short or long name).
PI_nameName of the person responsible for the data set.
contactemail address to responsible user support or principal investigator. If the email address of the principal investigator is used, the variable "PI_name" should be set accordingly.
distribution_statementA distribution statement, see below for a applicable list.
project_name Name of the project within which the data were collected. Several project dependencies can be indicated using comma separated project names.

ISO 19115 topic categories

See GCMD Users Guide for details.

Topic categories to be used within DOKIPY
CategoryDescription
biotafora and fauna in natural environments
climatologyMeteorologyAtmosphereprocesses and phenomena of the atmosphere
environmentenvironmental resources, protection, and conservation
geoscientificinformationinformation pertaining to earth sciences
imageryBaseMapsEarthCoverbase maps
inlandWatersinland water features, drainage systems and characteristics
oceansifeatures and characteristics of salt water bodies
societycharacteristics of society and culture

Keywords

See the WMO list of keywords. In addition you can provide keywords of your own choice if the WMO keywords are insufficient. Keywords should be comma separated, blank separated keywords are treated a a single keyword.

GCMD-Keywords

See GCMD Scientific Keywords list for a list of applicable keywords. Keywords should be separated by newlines.

Activity type

NameDescription
Moored instrumentMoored rig or instruements located at the sea floor
CruiseMeasurements performed from a ship
AircraftMeasurements performed from airplane or helicopter
Model runNumerical model simulation
Land stationMeasurements performed from a station with fixed position, e.g. an airport or a permanent research station (not drifting on sea ice)
Ice stationMeasurements performed from a station located at the sea ice
SubmersibleMeasurements performed from a submersible vehicle e.g. an unmanned submarine
FloatGlider measurements or drifting buoys
Space borne instrumentSatellite measurements
OtherAnything that do not fit into the above categories

Distribution statement

Distribution statements to be used within DOKIPY
StatementDescription
FreeNon restricted access
Restricted to IPYAccessible only for IPY-projects and scientists
Restricted to "project_name"Accessble only for members of the project community. By adding this it ensured that data are only available within the limited community of your IPY-project for a time period (6 months). The data owner will be contacted when it is time to release the data in a more general sense to the IPY community in accordance with the IPY Data Policy. Metadata will be published. A list of standardised project names is available.
No accessData are submitted to the database system, but will not be published within 12 months. However, according to IPY data policy, metadata will be published. Else data will be handled like in the previous classification.

Project names

Please contact us at dokipy_theusualsymbol_met.no if you miss your project in this list and you want to use this system for management of your IPY data.

Area of observation

Predefined list of area names to be used within DOKIPY
Area nameDescription
Arctic OceanSee http://en.wikipedia.org/wiki/Arctic_Sea
Barents SeaSee http://en.wikipedia.org/wiki/Barents_sea
Beufort SeaTBW
Chukchi SeaTBW
Denmark Strait SeaSee http://en.wikipedia.org/wiki/Denmark_strait
East Siberian SeaSee http://en.wikipedia.org/wiki/East_siberian_sea
Fram StraitTBW
Greenland SeaSee http://en.wikipedia.org/wiki/Greenland_Sea
Iceland SeaTBW
Kara SeaSee http://en.wikipedia.org/wiki/Kara_Sea
Laptev SeaSee http://en.wikipedia.org/wiki/Laptev_sea
Nordic SeasTBW
Northern HemisphereSee http://en.wikipedia.org/wiki/Northern_hemisphere
White SeaSee http://en.wikipedia.org/wiki/White_sea

Institutions

Predefined list of institution names to be used within DOKIPY.
Long nameShort name
Center for International Climate and Environmental Research CICERO
Nansen Environmental and Remote Sensing Center NERSC
Norwegian Meteorological InstituteMETNO
Norwegian Polar InstituteNPI
Norwegian Institute for Air ResearchNILU
Institute of Marine ResearchIMR
University of BergenUiB
The University Centre in SvalbardUNIS
University of Tromsø UiT

Operational status

Experimental
A product that is intended for operational use at some time in the future, but that in the current status still is undertaking major changes in observation methodology or alghorithms.
Pre operational
A product which is undergoing a review process to be accepted for operational status.
Operational
An operational product that is observed or generated on a regular basis by an institution with regular funding for this purpose. This is a stable product that require review prior to updates.
Scientific
Purely scientific dataset, unlikely to be performed on a regular basis using the same equipment due to no permanent funding or other cause.

Interactive data submission

Take me to the interactive file upload!