Table of Contents
The purpose of this document is to describe the data integration architecture for the New Mexico Water Data Initiative (NNWDI). It describes in a “bottom-up” approach the NMWDI. It proceeds as follows
Outline of the overall multi-agency architecture
Description the data model and API standard through which all agency data will ideally be integrated and served to users
Description of options for how agencies can provide their data in the standard data model and API standard.
Key terms are linked directly to their entries in the project Glossary of Terms .
The New Mexico Water Data Initiative Architecture
The goal of the New Mexico Water Data Initiative is to make available to the public data collected by multiple agencies about water resources in New Mexico in a common format. The aspirational user story is as follows:
“As a nonexpert community member interested in New Mexico’s water resources, I want to discover and access water information relevant to the location or geographical area I care about from all of the organizations that hold data about it, so that I don’t have to have special knowledge to access some information and so I don’t miss some potentially relevant information. I want the data I discover to be delivered to me in a common format regardless of organization, so that I can visualize and analyze data from all of the organizations without substantial preprocessing on my part.”
An aspirational demonstration application can be found here, where monitoring wells can be visualized from multiple agencies, with parameters and measurements searched for in a common interface and returned in a csv file with common column names. This application is based on a workflow where multiple agencies' data have been transformed and provided with independent instances of a standard API. The single application can then allow users to interact seamlessly with data from multiple agencies being provided by multiple APIs. This NMWDI architecture is being developed to enable this use case.
Currently, many (but not all) agency data are already published online through services such as ESRI web maps, excel files, or in some cases public APIs. However, important aspects for a given data type (such as water table level measurements from wells) such as data/time formats, geospatial projections, column names, and units vary from agency to agency and even from dataset to dataset within agencies. In order to allow users to access data from multiuple agencies in one format, the NMWDI architecture will route all agency data through one Web API standard with one corresponding underlying data model that references one common statewide water data controlled vocabulary. As long as each agency somehow serves their data through the common Web API, data storage can be federated (i.e. not centralized), although some degree of centralization can be accomodated if that is most convenient for a given dataset. Each agency’s standardized API will be published through a central portal with an NMWDI administered API Management Platform. Users can send API requests to the management platform, which will route these requests to the agency APIs and in turn forward the responses to users. However, whether data storage is federated across agencies or centralized, all contributing agency data will be required to be mapped to the common data model and transformed into the common format before being delivered to users. This basic data flow is illustrated in Figure 1.
Figure 1. Basic data flow.
The Data Model & API Standard: OGC SensorThings API (STA)
The above basic data flow requires a state-wide data model and API standard. The NMWDI has chosen the OGC SensorThings API as the model and standard. The OGC is the Open Geospatial Consortium, an international standards organization that creates and publishes open standards for geospatial data management, processing, and sharing.
The STA data model
The STA data model is based on the Observations and Measurements data model of the OGC, which itself underlies many environmental science data systems that integrate data from many independent organizations. Examples include the CUAHSI HydroClient that provides centralized access to global streamgage, monitoring well, and meteorological networks; and the National Groundwater Monitoring Network that provides centralized access to standardized high-frequency groundwater level and quality data from federal, state, and local agencies. The STA data model provides a unifying metadata standard and data structure standard that can model any data generated about point or polygonal locations on earth. It is important to be able to map agency data to this data model in order to structure each agency’s data in a compatible format and to provide a seamless data request experience to users. The STA data model shown in Figure 2 below, and full specified in this OGC Specification.
Figure 2: The STA data model entity-relationship diagram
Table 2 below provides definitions for the entities and key properties, as well as example mappings to some agency data. The headers for the three examples link to actual data that can be used as a reference for the diversity of data being represented and how it can be modeled in the STA data model. The exercise of mapping agency data to this data model is very important to further more functional data integrations steps.
Table 2: The STA data model definitions and example mappings to NM agency datasets
SensorThings Entity | Description | Example: NMBGMR Aquifer Monitoring Well | |||
---|---|---|---|---|---|
Metadata | Location | A unique coordinate or area on the surface of the earth | Location in latitude and longitude or UTM easting and Northing (UTM Zone 13, NAD83) | Street Address (possibly with associated latitude and longitude). (e.g. 3960 PRINCE ST) | Location in easting and northing (UTM NAD83 in meters) |
Thing | Some real-world thing with which one or more Sensors are associated | Well Point ID WL-0150 | Sample Pt RT236I | Point of Diversion POD Number A 00008 AS | |
Datastream | A collection of Observations about an ObservedProperty produced by a Sensor associated with a Thing | Time series, Hydrograph | Sample Results | Meter Readings (Quarterly) | |
Datastream/observationType | The type of observation, codified in the Observations and Measurements data standard. Types include Categorical (defined text), Count (integer), Measurement (continuous number), Observation (free text), and TruthObservation (True/False) | Measurement | Categorical or TruthObservation | Measurement | |
Datastream/unitOfMeasurement | A three-item definition of the unit of measurement, including its name, symbol, and link to the definition (preferably to one provided in an established ontology such as http://unitsofmeasure.org/ucum.html or http://qudt.org/) | feet (e.g. http://qudt.org/vocab/unit/FT) | TCR Result | Acre-Feet (e.g. http://qudt.org/vocab/unit/AC-FT) | |
Sensor | The procedure used to provide a Datastream. Can be a particular data recording device model, or a defined procedure followed by a human observer. If applicable, a specific instance (e.g. a sensor model and serial number) | Steel-tape measurement; Continuous acoustic sounder | 9223B-PA (https://www.standardmethods.org/doi/10.2105/SMWW.2882.194) | MCCROMETER Diversion Meter-Meter Number 17147 | |
ObservedProperty | The raw or processed phenomenon (quantitative or qualitative) being measured for the Datastream. Preferably including a link to a definition provided by an established ontology or controlled vocabulary such as the ODM2 Controlled Vocabularies or http://qudt.org/) | Depth to Water Below Ground Surface (BGS) | Analyte (e.g. Coliform (TCR) (3100)) | Mtr Amount | |
OPTIONAL: FeatureOfInterest | The real-world feature that the Observations are about. This may or many not be different from the Location where the Thing on which the Sensor is mounted. Can include a JSON-formatted point location or a polygon or collections thereof. | Formation (e.g. https://maps.nmt.edu/maps/data/hydrograph/formation_lu) | Public Water System (head office location or service area boundary) (e.g. Albuquerque Water System PWSID NM3510701) | Water Right (set of relevant points of diversion) | |
Data | Observation | A single measurement value including the result, time values, and other metadata. Information on the ObservedProperty that was measured by what Sensor is provided by the Datastream these observations are in. Features of Interest are linked for each observation as well. Observations are linked to (collected in) Datastreams | Depth Measurement | Sample (e.g. 763391) | Meter Reading |
Observation/result | The actual measured value, with valid values defined in observationType and units defined in unitsOfMeasurement, both provided by Datastream | Depth (e.g. 337.08) | Sample Result (P (Positive/ Coliform found) A (Negative/ Coliform not found)) | Mtr Amount (e.g. 107.948) | |
Observation/phenomenonTime | The date+time (or interval) in ISO 8601 format (YYYY-MM-DDT:HH:MM:SS-Z) when the observation occured | 2019-01-31 00:00:00 | MP (Monitoring Period) (e.g. 01-01-2020 to 01-31-2020) | 1/20/2017-04/05/2017 (Quarterly period for which volume was measured) | |
OPTIONAL: Observation/resultTime | The date+time that the result was generated. May be the same as phenomenonTime | Date (e.g 01-06-2020) | 04/05/2017 (date of meter reading) | ||
OPTIONAL: Observation/validTime | The date+time interval during which the Observation can be used (often used for provisional values that are replaced by QA/QC’d observations) | ||||
OPTIONAL: Observation/resultQuality | A description of the result Quality. Will vary according to agency practice. Can use ODM2 controlled vocabulary for data quality types as a guide. | Precision (e.g. “within two hundredths of a foot”) |
The STA API Standard
The API standard is the part of OGC SensorThings API that allows any dataset that is formatted into the SensorThings data model detailed above in a database with an instance of the API connected to it to be added to or queried by humans or, more importantly, computers (typically web applications or automated regulatory and scientific data processing workflows) in a fast, replicable, and automated way. The API standard is essentially a set of rules that says “if you send me a request that looks like this, I will give you a response that looks like that”. In the case of the SensorThings API, requests and responses are sent and delivered via HTTP requests.
For example, The NMWDI is currently operating an instance of STA at https://st.newmexicowaterdata.org/FROST-Server/v1.1 . The HTTP GET request https://st.newmexicowaterdata.org/FROST-Server/v1.1/Locations (which can be copy+pasted into a web browser) returns a JSON-formatted list of all Locations for which there is information.
{ "@iot.nextLink": "https://st.newmexicowaterdata.org/FROST-Server/v1.1/Locations?$skip=100", "value": [ { "name": "NMWDI-0000001", "description": "WELL", "encodingType": "application/vnd.geo+json", "location": { "type": "Point", "coordinates": [ -108.068892, 36.796529 ] }, "HistoricalLocations@iot.navigationLink": "https://st.newmexicowaterdata.org/FROST-Server/v1.1/Locations(1)/HistoricalLocations", "Things@iot.navigationLink": "https://st.newmexicowaterdata.org/FROST-Server/v1.1/Locations(1)/Things", "@iot.id": 1, "@iot.selfLink": "https://st.newmexicowaterdata.org/FROST-Server/v1.1/Locations(1)" }, { "name": "NMWDI-0000031", "description": "WELL", "encodingType": "application/vnd.geo+json", "location": { "type": "Point", "coordinates": [ -105.11427778, 32.02255556 ] }, "HistoricalLocations@iot.navigationLink": "https://st.newmexicowaterdata.org/FROST-Server/v1.1/Locations(31)/HistoricalLocations", "Things@iot.navigationLink": "https://st.newmexicowaterdata.org/FROST-Server/v1.1/Locations(31)/Things", "@iot.id": 31, "@iot.selfLink": "https://st.newmexicowaterdata.org/FROST-Server/v1.1/Locations(31 },
Adding the paramaeter ?$resultFormat=CSV returns the same information as tabular data, and adding ?$resultFormat=GeoJSON returns the same information as a GeoJSON file that can be used in GIS applications such as ESRI ArcGIS or QGIS or to create custom web maps. STA is a very full-featured API that enables complex, multi-parameter queries. Full detail can be found at the OGC API specification document or the interactive documentation.
For example, this query requests all Observations about Total Dissolved Solids (TDS) about a particular Feature of Interest (Well) between August 14 and August 16 2014, https://st.newmexicowaterdata.org/FROST-Server/v1.1/Observations?$filter=FeatureOfInterest/id eq '32' and Datastream/ObservedProperty/name eq 'TDS' and phenomenonTime gt 2014-08-14T00:00:00.000Z and phenomenonTime lt 2014-08-16T00:00:00.000Z&$expand=Datastream
The response shows there is one such observation, that it occured on August 15, 2014 with a result of 274, and that the unit of measurement is parts per million, that the result type is a measurement (continuous number) and that the Feature of Interest’s location is longitude -105.59 and latitude 36.73:
{ "value" : [ { "phenomenonTime" : "2014-08-15T00:00:00.000Z", "resultTime" : "2014-08-15T00:00:00.000Z", "result" : 274.0, "Datastream@iot.navigationLink" : "https://st.newmexicowaterdata.org/FROST-Server/v1.1/Observations(1148627)/Datastream", "Datastream" : { "name" : "TDS Water Quality Datastream", "description" : "No Description Available", "observationType" : "http://www.opengis.net/def/observationType/OGC-OM/2.0/OM_Measurement", "observedArea" : { "type" : "Point", "coordinates" : [ -105.587183, 36.727965 ] }, "phenomenonTime" : "2014-08-15T00:00:00.000Z/2014-08-15T00:00:00.000Z", "resultTime" : "2014-08-15T00:00:00.000Z/2014-08-15T00:00:00.000Z", "unitOfMeasurement" : { "name" : "Parts Per Million", "symbol" : "PPM", "definition" : "http://www.qudt.org/qudt/owl/1.0.0" }, "@iot.id" : 5007, "@iot.selfLink" : "https://st.newmexicowaterdata.org/FROST-Server/v1.1/Datastreams(5007)" }, "FeatureOfInterest@iot.navigationLink" : "https://st.newmexicowaterdata.org/FROST-Server/v1.1/Observations(1148627)/FeatureOfInterest", "@iot.id" : 1148627, "@iot.selfLink" : "https://st.newmexicowaterdata.org/FROST-Server/v1.1/Observations(1148627)" } ] }
This JSON can be easily parsed by a web application for visualization, as it is in the demonstration application, or be returned as a CSV to a user.
While the API is useful for creating web applications for any given agency’s data, it is most integral to the NMWDI purpose of data integration. If multiple agencies' data is available via the same API standard, then this reduces the burden on data users to access data from each agency. Consider the complex query above. We can modify it slightly by changing the first part of the URL from https://st.newmexicowaterdata.org/FROST-Server/v1.1 to https://nm.ngwmn.internetofwater.dev/api/v1.1 The latter URL points to an entirely different API written in a different programming language on a separate database stewarded by a different organization. In this case the data is a copy of the New Mexico part of the National Groundwater Monitoring Network being served by the Internet of Water at Duke University. By changing this first part, as well as changing ‘TDS’ to ‘Water Level Below Ground Surface’, and expanding the date range to include everything between 2019 and 2020, we contruct this query: https://nm.ngwmn.internetofwater.dev/api/v1.1/Observations?$filter=FeatureOfInterest/id eq '32' and Datastream/ObservedProperty/name eq 'Water Level Below Ground Surface' and phenomenonTime gt 2019-01-01T00:00:00.000Z and phenomenonTime lt 2020-01-01T00:00:00.000Z&$expand=Datastream
and receive a similarly formatted result including a time series of water level measurements (in feet below ground surface) conducted by the USGS in calendar year 2019. JSON response below, with CSV version here.
{ "value" : [ { "phenomenonTime" : "2019-08-21T14:26:00.000Z", "resultTime" : "2019-08-21T14:26:00.000Z", "result" : 4722, "Datastream@iot.navigationLink" : "https://nm.ngwmn.internetofwater.dev/api/v1.1/Observations(95866)/Datastream", "Datastream" : { "name" : "Depth Below Surface", "description" : "Estimated depth to water table below ground surface", "observationType" : "http://www.opengis.net/def/observationType/OGC-OM/2.0/OM_Measurement", "observedArea" : { "type" : "Point", "coordinates" : [ -106.885, 34.352 ] }, "phenomenonTime" : "1975-07-08T12:00:00.000Z/2020-02-27T16:40:00.000Z", "resultTime" : "1975-07-08T12:00:00.000Z/2020-02-27T16:40:00.000Z", "unitOfMeasurement" : { "name" : "feet", "symbol" : "ft", "definition" : "http://www.qudt.org/qudt/owl/1.0.0/unit/Instances.html#Foot" }, "@iot.id" : 63, "@iot.selfLink" : "https://nm.ngwmn.internetofwater.dev/api/v1.1/Datastreams(63)" }, "FeatureOfInterest@iot.navigationLink" : "https://nm.ngwmn.internetofwater.dev/api/v1.1/Observations(95866)/FeatureOfInterest", "@iot.id" : 95866, "@iot.selfLink" : "https://nm.ngwmn.internetofwater.dev/api/v1.1/Observations(95866)" }, { "phenomenonTime" : "2019-03-04T16:05:00.000Z", "resultTime" : "2019-03-04T16:05:00.000Z", "result" : 4722, "Datastream@iot.navigationLink" : "https://nm.ngwmn.internetofwater.dev/api/v1.1/Observations(95864)/Datastream", "Datastream" : { "name" : "Depth Below Surface", "description" : "Estimated depth to water table below ground surface", "observationType" : "http://www.opengis.net/def/observationType/OGC-OM/2.0/OM_Measurement", "observedArea" : { "type" : "Point", "coordinates" : [ -106.885, 34.352 ] }, "phenomenonTime" : "1975-07-08T12:00:00.000Z/2020-02-27T16:40:00.000Z", "resultTime" : "1975-07-08T12:00:00.000Z/2020-02-27T16:40:00.000Z", "unitOfMeasurement" : { "name" : "feet", "symbol" : "ft", "definition" : "http://www.qudt.org/qudt/owl/1.0.0/unit/Instances.html#Foot" }, "@iot.id" : 63, "@iot.selfLink" : "https://nm.ngwmn.internetofwater.dev/api/v1.1/Datastreams(63)" },
Since these entirely different data sources provide their data using the same API standard, they can be integrated seamlessly into one data discovery, visualization and delivery system, as in the demonstration application.
Options for Providing Agency Data via SensorThings API
The most difficult part of providing every agency’s data via SensorThings API is mapping the agency datasets to the SensorThings data model. Assuming this is done, there are three main options to provide SensorThings API data.
Stand up agency-hosted SensorThings API database and server. This is the option currently being used by the NMBGNR. This option provides the most autonomy to the agency but requires the most effort. It could be an attractive option for agencies with geospatially-referenced time-series data (including non-water data) that does not currently have a public-facing API and desire one independent of any NMWDI initiatives.
Export data to a templated tabular format and send this tabular data to an NMWDI-administered intermediate data store called an “Icebox” at clowder.newmexicowaterdata.org, from which it will be copied into a centralized NMWDI SensorThings API database and server. This option still requires the agency to provide regular exports of their data, but leaves the responsibilities of the administration of web services to NMWDI.
Collaborate closely with NMWD to proxy an existing agency API and republish it as a SensorThings API through the NMWDI API management platform. This option is only appropriate for agencies with APIs that are capable of publishing all data relevant to NMWDI, and that they are willing to make public-facing.
These three options, including how they would add to probably existing agency workflows, are summarized in Figure 2.
Figure 2. Options for Providing Agency Data via SensorThings API
Option 1: Agency-hosted SensorThings
In this option, the agency will host its own version of a SensorThings database and API server. The following three steps would need to be taken:
Set up a SensorThings database and API instance in a preferred environment, configuring its domain name and security as appropriate to the agency.
Write scripts to Extract, Transform, and Load (ETL) data from existing databases or tabular data sources (e.g., csv, excel) and upload them to the SensorThings database using the SensorThings API.
Provide the URL of the agency SensorThings instance to NMWDI for registation in the API Manasgement platform and CKAN.
Step 1: Setting up a SensorThings instance
There are several fully-featured versions of SensorThings that can be hosted:
SensorUp is the reference implementation of STA. It is a private Canadian company offering a license + subscription model with customer support and an integrated dashboard and visualization suite. SensorUp is lead by Dr. Steve Liang, who was the lead developer of SensorThings API.
FROST-Server, a free and open-source server written in Java designed for deployment in a Tomcat Java Servlet with a PostGIS database. FROST is developed by the Fraunhofer Society, the main government-supported applied research organization of Germany.
52 North STA, a different free and open-source server written in Java designed as a Spring Boot service with a PostGIS database. 52 North is a non-profit geospatial IT research entity associated with German and EU universities and research consortia that develops open source software as well as provides professional IT consulting.
Geodan GOST, a free and open-source server written in Go designed as a platform-independent Go service connected to a PostGIS database and a basic graphical user interface. Geodan is a private Dutch firm specializing in geospatial business analytics.
All three options can be made to run through cloud-native stacks such as Google App Engine or through any local physical or virtual machine environment through containerization paradigms such as Docker and Kubernetes. The fastest way to set up a simple SensorThings instance is by using Docker and Docker Compose. Docker containers are virtual environments that can run on any host operating system and contain only the software and resources necessary to run a desired application. Docker Compose is a framework to run multiple Docker containers that are linked together in a virtual network on the same host. Below is a a simple process for setting up SensorThings API on a local machine or a cloud VM using FROST-Server Docker Containers.
Install Docker, using instructions depending on the operating system (can be Windows, macOS, Ubuntu, Debian, CentOS, Fedora).
Install Docker Compose, again using instructions for your operating system
Download this FROST-Server docker-compose.yaml configuration file into a directory of your choosing (e.g. /User/Home/SensorThings). This file gives the Docker engine instructions on which pre-configured virtual environments to download from Docker Hub (in this case, a CentOS
In the console (Linux), Terminal (macOS) or PowerShell (Windows) navigate to the directory where docker-compose.yaml is, and run docker-compose up, e.g.
cd ~/User/Home/SensorThings docker-compose up
5. SensorThings can be accessed by navigating to the browser on the machine to http://localhost:8080/FROST-Server/v1.1/
Detailed documentation on how to interact with and configure FROST-Server can be found on the FROST documentation site. Additional security measures, including authentication and enabling https are best done with a proxy server such as Apache 2, Nginx, or Caddy 2. A common setup would be for the SensorThings instance to restrict external users from being able to make any HTTP requests other than GET requests. This allows the public to download data, but not to add, modify, or delete data. Other HTTP requests can be restricted to agency intranet and/or specifically authorized users with appropriate authentication.
Step 2: Uploading data to SensorThings
Regardless of which particular version of SensorThings API is set up, uploading data to the SensorThings database is the same, and involves three general steps:
Mapping the data to the SensorThings data model
Extracting the data from the Agency data system into a JSON form mapped to the data model
Uploading the extracted and JSON-formatted data into SensorThings via HTTP POST requests. There is helpful documentation regarding this here
An example Python workflow for the above three steps from the NMBGMR groundwater level and quality database is available here: https://github.com/NMBGMR/WDIETL/tree/master/etl Similar workflows can be constructed with any scripting language capable of making HTTP requests, making SQL-type queries from database connections and/or importing and manipulating data from tabular data sources such as csv, tsv, and excel files. Common options include Java, Javascript, PHP, R, Shell script, etc. NMWDI can be consulted with for assistance in setting up these workflows.
Step 3: Providing the url and any necessary authentication to NMWDI.
In order for NMWDI (and the public) to use the agency SensorThings instance, it must be available from a publicly accessible URL. This will likely require pointing an agency subdomain over DNS to the IP address of the machine the SensorThings instance runs on. It is also adviseable to reverse-proxy the service over HTTPS.
Option 2: Exporting data to a tabular template
If it is not feasible for an agency to administer their own instance of SensorThings API, then another option is to provide regular exports of data into a tabular format and metadata into .yaml configuration files, and uploading this data and metadata to the NMWDI clowder, an intermediate data repository. From this repository, NMWDI will have scheduled automated jobs that upload the tabular data into a centralized SensorThings API instance. The steps involved are similar to Step 2 of Option 1:
Map the data to the SensorThings data model
At a regular interval, extract the data from the Agency data system into a set of tabular data files that meet an NMWDI standard template
Upload the tabular data and/or configuration files into Clowder
Step 1: Mapping the data to the SensorThings data model
See here
Step 2: Extract agency data to tabular template data files
NMWDI will provide template .csv and/or .yaml files to be populated by Agency database SQL exports and/or transformations of existing tabular data files. Some of the fields should be filled with URIs from a controlled vocabulary. Note that this an extra step from the table representing the mapping exercise from agency data to the STA data model A variety of tools and scripting languages can be used to export data from databases and export such files. The template will be formatted similarly to this:
Column name | Type | Description | Example row |
---|---|---|---|
location_name | string | name of the Location | A 00008 AS |
location_description | string | description of Location | Point of Diversion |
location_northing | float | northing (e.g. latitude) | 3541774 |
location_easting | float | easting (e.g. longitude) | 139712 |
location_type | CV_string_uri | specification of coordinate system/ units | NAD83 UTM in meters |
thing_name | string | name of Thing | A 00008 AS |
thing_description | string | description of Thing | Well point of diversion |
sensor_name | string | name of Sensor | MCCROMETER-17147 |
sensor_description | string | description of Sensor | Diversion Meter |
sensor_metadata | string | link to metadata sheet | |
datastream_name | string | name of Datastream | Quarterly Meter Readings |
datastream_type | CV_string_uri | URI of datastream type from NMWDI controlled vocabulary | http://www.opengis.net/def/observationType/OGC-OM/2.0/OM_Measurement |
datastream_units | CV_string_uri | URI of datastream unit of measurement from NMWDI or other controlled vocabulary | |
datastream_observedproperty | CV_string_uri | URI of observed property from NMWDI or other controlled vocabulary | |
result | float/string/boolean/int | measured/ processed value | 107.948 |
phenomenonTime | ISO 8601 | time measurement is about | 2017-05-04T00:00:00.000Z |
resultTime | ISO 8601 | time measurement is taken | 2017-05-04T12:00:00.000Z |
resultQuality | string | any quality description |
Step 3: Upload template files at a defined frequency to NMWDI Clowder “Icebox”
Clowder has both a graphical user interface and an API that can be used to automate the upload of files. Agencies should coordinate with NMWDI to set up their Agency-specific spaces and credentials on Clowder for such workflows.
Add Comment