By Dr. Conor Delaney, EMODnet Technical Coordinator, and member of the ERDDAPTMStrategic Insight Group
Introduction
In my last blog (The EMODnet MapViewer, your tool to data discovery) I described the EMODnet MapViewer, which is the core data discovery, access, and download tool for the EMODnet portal. In this blog, I look at the data server technology called ERDDAPTM, which is one of the data serving technologies used by EMODnet which played a pivotal role in the centralisation of EMODnet (Figure 1).
Figure 1: The launch of a centralised EMODnet in 2023 marked a key event in the transition of EMODnet into an operational service of the European Commission (DG MARE).
The MapViewer is a Graphical User Interface (GUI) to data services that exist behind the GUI. It is these data services that do the work of fetching and downloading data behind the scenes. When a user visualises or downloads data via the MapViewer, they are doing so by means of sophisticated machine-to-machine requests from the MapViewer to the background data servers. These requests are invisible to the user as the MapViewer business logic handles the construction of data request messages that are sent to data servers. It is one of these background servers, called ERDDAPTM which we will briefly explore here.
ERDDAPTM – what is in a name?
The Environmental Research Division Data Access Protocol ( ERDDAPTM) data server is an Open-Source web hosted technology developed by the fisheries division of the National Oceanic and Atmospheric Administration (NOAA) of the United States. It was developed to address several challenges which arise when scientific data is publicly shared across multiple remotely located data servers. For example:
Can scientific data be made more accessible to non-scientists by providing the data in more familiar file formats?
Can users request subsets of the data sets to get to data that they want, instead of having to download large complete data sets?
Can we ensure the quality of the data being published by also publishing the corresponding metadata for the data in a way that is easy to access and understand?
Can we make numerous remotely located data silos appear as one so that users don’t have to navigate around the Internet looking for data?
Can we provide a consistent Application Programming Interface (API) to the data so that users can integrate the output of the data server into their programs?
Can this be done on the World Wide Web?
Unsurprisingly, these challenges are not unique to NOAA; we at EMODnet face them as well. Many marine and atmospheric data publishers around the Earth are using ERDDAPTM to solve similar challenges, and to date it is being used in at least 17 countries. Since its initiation, EMODnet adheres to the principles of open data and – since its community adoption in 2016 - the Findable, Accessible, Reusable and Interoperable (FAIR) data principles. In addition, Europe has the Open Data Directive and INSPIRE Directive, to which EMODnet adheres to and uses relevant standards. However, with the ongoing digitalisation of data services and to overcome a number of ongoing technical challenges, it made sense for the central technical team of EMODnet to adopt ERDDAPTM as a tried and trusted technology to help us solve challenges that arose when we moved to centralise EMODnet (see blog 1, March 2024), with the ultimate goal to simplify and improve the EMODnet user experience.
Using ERDDAP to make scientific data more accessible
Scientists publish data in many different formats for numerous reasons that may not be obvious to the non-scientific community. In the world of Oceanography, a file format called NetCDF is the standard format used for published gridded (think 3-D) data sets. NetCDF files are a combination of well-structured gridded data and metadata that conforms to the Climate and Forecast metadata standard (CF). Several of the larger EMODnet products are in the NetCDF format, and when hosted on an ERDDAP server, they can be made available to users in several convenient ways. Take for example the Bathymetry Digital Terrain Model (DTM) of the seabed. Before centralisation this was available as 64 separate tiles (64 NetCDF files), so if you wanted to work with a DTM for the Mediterranean, you needed to download the relevant tiles and join them together yourself. For the centralisation, we created one unified DTM for Europe by combining the 64 NetCDF tiles into one large (over 130 GBs) NetCDF file, which we then published on an EMODnet hosted ERDDAP server. By doing this we provide users with the capability to:
- Subset areas of the single NetCDF to get a DTM for their areas of interest.
- Download the data in a file format that they are interested in. This is possible because ERDDAP knows how to convert NetCDF files into other formats.
- View, search and download related metadata for the file of interest.
- Harness the machine-to-machine interoperability capabilities of ERDDAP via simple URL strings (i.e., web links). For example, a user can construct a URL that asks the ERDDAP server to subset a data set it hosts and return it to the user in a format of their choosing.
Figure 2, below, shows an interaction with the MapViewer. In this instance the user is requesting a subset of the Bathymetry DTM NetCDF in the NetCDF format, the MapViewer takes this user input and constructs a URL (a web link) and sends that to the EMODnet ERDDAP server, which returns the requested dataset for download to the user (via the MapViewer).
Figure 2: In this image, the EMODnet Map Viewer ‘Download’ tool on the left was clicked, and this in turn opened the selected region and download service on the righthand side. A region has been selected by drawing a box on the map.
To get a better understanding of how the MapViewer (or any user) is interacting with ERDDAP, see Figure 3 below. In Figure 3 the user is zooming in on the Bathymetry DTM to the same region of interest as is illustrated in Figure 2. As the user zooms in, the corresponding parameters are set (latitude and longitude for example) and a URL is constructed (see bottom of image in Figure 3).
Figure 3: The tool that ERDDAP provides to familiarise users with the ERDDAP API. Zooming into the Bathymetry DTM, as seen in the graphic window, sets the various parameters (latitude and longitude for example) and constructs the URL.
EMODnet ERDDAP catalogue and data silos
ERDDAP also publishes a catalogue for all data sets served by the server, see Figure 4, for the EMODnet ERDDAP catalogue. This catalogue is constructed out of the metadata supplied with datasets when they are added to the server. Additional metadata that is contained within the NetCDF file is displayed when the ‘M’ link is clicked.
Figure 4: The ERDDAP catalogue. Clicking on the ‘graph’ link will bring up a form like the one displayed in the previous Figure 3. Clicking on ‘M’ will reveal metadata.
The catalogue listed in Figure 4 gives the user the impression the datasets listed are stored locally, however this is not always the case. An ERDDAP server can display datasets in its catalogue that are hosted on other ERDDAP servers located anywhere around the world. The user of a particular ERDDAP server doesn’t need to know where the datasets on the catalogue are located, they just interact with it as normal and ERDDAP handles the rest, even if the requested dataset is hosted on the other side of the world. In this way ERDDAP can be used to unify remote silos of data.
The future
For EMODnet, ERDDAP has become a key technology that has made it easier for us to publish marine data products and we see a bright future with it for data sharing in Europe. Hopefully you have been inspired to explore the EMODnet ERDDAP server, if so, click here:
NOAA, in recognition of the key role ERDDAP is playing in publishing marine environmental data globally, has taken steps to make it easier for software developers to contribute to the Open-Source project. Indeed, some of the EMODnet community members have already contributed the code base and the maintenance of the EMODnet project. If you are a software developer and you wish to find out more, please check out: https://github.com/ERDDAP/erddap