Import Toolchain

for this first iteration assume uploading a single dataset, i.e. a time series of water levels from one well or stream gauge. A bulk multiple point upload will be developed later.

 

  1. Starting point. I am a data provider and have made the required upload.yml file. Upload Formats

Digression: WDI potentially should provide tools or guidance or the option to upload a non-normalized xls or csv file. This may be more conducive to agencies existing work flows. However the toolchain we are designed here is assuming the starting point is upload.yml

update: 5/17/20 added upload.csv compatible format. currently exact headers are required, no column order enforced. also added a csvextractor to convert the csv file to a set of yaml files.

2. Upload. Login into the WDI upload app. submit your upload.yml file

another option is provide a web form at this point that will generate a meta.yml file for the user.

the user could select from user defined meta.yml.templates. These templates would simply prepopulate the entry form if selected, so the user could easily override existing settings.

3. upload.yaml loaded into Clowder Icebox.

clowder is just another Open Data portal. Its relatively lightweight but also seems to provide all the functionality we need for a staging area. CKAN will still serve as the public facing data catalog. Clowder will be used manage the internal ETL processes.

4. WDI QC? move data from Icebox to CKAN

the move would be handled by a dedicated “extractor”. triggered manually or maybe via tagging event?

5. WDI triggers a ST extraction.

extractor reads in upload.yaml and uploads records to a ST instance via the ST API.

setup a dockerized extractor image. launch a cluster of extractor containers.