Filter point observations to pre-defined site networks

To launch this notebook interactively in a Jupyter notebook-like browser interface, please click the “Launch Binder” button below. Note that Binder may take several minutes to launch.

Binder

This notebook showcases functionality of the get_point_data and get_point_metadata functions to filter sites based on a pre-defined site network.

For USGS stream gages, the currently-supported set of site networks include:

For USGS groundwater wells, the currently-supported set of site networks include:

Please see the full point module documentation for information on what data is available, our data collection process, and new features we are working on! Our Metadata Description page itemizes the fields that get returned from get_point_metadata.

[1]:
# Import packages
from hf_hydrodata import register_api_pin, get_point_data, get_point_metadata
import pandas as pd
[ ]:
# You need to register on https://hydrogen.princeton.edu/pin
# and run the following with your registered information
# before you can use the hydrodata utilities
register_api_pin("your_email", "your_pin")

Note that get_point_data and get_point_metadata require mandatory parameters of dataset, variable, temporal_resolution, and aggregation (and depth_level if asking for soil moisture data). Please see the documentation for information about what point observation datasets are available and the parameters used to query them.

The hf_hydrodata API Reference includes information on what optional filtering parameters are available. These include filters for things like a geographic region or date range. Those parameters work cumulatively, so if state and site_ids are both supplied, for example, then only sites within site_ids that are also in state will be returned.

Example: Query stream gage data for GAGES-II sites in Colorado

In this example, we are interested in querying the stream gages that are part of the GAGES-II network within the state of Colorado (state = 'CO'). We’ll focus on data within Water Year 2003, so we’ll set date_start='2002-10-01' and date_end='2003-09-30'. Note that we are setting site_networks='gagesii' to get only stream gages that are part of the GAGES-II network.

[2]:
# Get point observations data
data_df = get_point_data(dataset="usgs_nwis", variable="streamflow", temporal_resolution="daily", aggregation="mean",
                         date_start="2002-10-01", date_end="2003-09-30",
                         state="CO", site_networks="gagesii")

# View the first five records
data_df.head(5)
[2]:
date 06614800 06620000 06659580 06696980 06700000 06701500 06701620 06701900 06707500 ... 09371000 09371010 09371492 09371520 09372000 393109104464500 394308105413800 394839104570300 401733105392404 402114105350101
0 2002-10-01 0.019810 0.97635 NaN 0.116879 NaN 8.4051 NaN 9.5371 11.3766 ... 0.000000 14.1500 0.023489 0.46978 0.274793 0.048110 0.61977 1.26784 0.071316 0.33677
1 2002-10-02 0.021508 1.01031 NaN 0.148009 NaN 8.3485 NaN 9.8767 12.1690 ... 0.066505 15.6782 0.040469 0.53204 0.285830 0.281868 0.81504 2.81019 0.071316 0.39903
2 2002-10-03 0.022357 1.23388 NaN 0.164140 NaN 7.1882 NaN 8.5466 10.2729 ... 0.155650 19.1874 0.091975 1.10936 0.472610 0.249889 0.88862 1.23954 0.069618 0.46695
3 2002-10-04 0.025753 1.81969 NaN 0.146877 NaN 5.3204 NaN 5.9996 8.2919 ... 0.350920 19.6119 0.043582 0.50657 0.894280 0.219325 0.70184 0.68769 0.068203 0.43299
4 2002-10-05 0.024621 1.98100 NaN 0.143198 NaN 4.4997 NaN 5.0374 6.5090 ... 0.060279 22.7249 0.026885 0.47261 0.585810 0.191591 0.64807 0.47827 0.066788 0.43865

5 rows × 290 columns

[3]:
# Get site-level attributes for these sites
metadata_df = get_point_metadata(dataset="usgs_nwis", variable="streamflow", temporal_resolution="daily", aggregation="mean",
                                 date_start="2002-10-01", date_end="2003-09-30",
                                 state="CO", site_networks="gagesii")

# View the first five records
metadata_df.head(5)
[3]:
site_id site_name site_type agency state latitude longitude first_date_data_available last_date_data_available record_count ... doi huc8 conus1_x conus1_y conus2_x conus2_y gagesii_drainage_area gagesii_class gagesii_site_elevation usgs_drainage_area
0 06614800 MICHIGAN RIVER NEAR CAMERON PASS, CO stream gauge USGS CO 40.496094 -105.865012 1973-10-01 2023-12-01 18322 ... None 10180001 1054 818 1481 1764 4.02840 Ref 3188.0 1.54
1 06620000 NORTH PLATTE RIVER NEAR NORTHGATE, CO stream gauge USGS CO 40.936639 -106.339194 1904-06-01 2023-12-01 39782 ... None 10180001 1020 870 1448 1817 3702.63700 Non-ref 2388.0 1431.00
2 06659580 SAND CREEK AT COLORADO-WYOMING STATE LINE stream gauge USGS CO 40.993650 -105.759703 1968-10-01 2020-09-01 10075 ... None 10180010 nan nan 1496 1814 79.11089 Non-ref 2323.0 29.20
3 06696980 TARRYALL CREEK AT UPPER STATION NEAR COMO, CO stream gauge USGS CO 39.339433 -105.911681 1978-06-01 2023-10-13 5420 ... None 10190001 1036 690 1466 1639 61.90650 Ref 3040.0 23.90
4 06700000 SOUTH PLATTE RIVER ABOVE CHEESMAN LAKE, CO. stream gauge USGS CO 39.162769 -105.310273 1924-10-01 2023-09-30 9523 ... None 10190002 nan nan 1515 1617 4213.53800 Non-ref 2092.0 1627.00

5 rows × 23 columns

This gives us the data for the 289 Colorado GAGES-II sites that have data within the specified date range.