Filter point observations to pre-defined site networks
To launch this notebook interactively in a Jupyter notebook-like browser interface, please click the “Launch Binder” button below. Note that Binder may take several minutes to launch.
This notebook showcases functionality of the get_point_data and get_point_metadata functions to filter sites based on a pre-defined site network.
For USGS stream gages, the currently-supported set of site networks include:
GAGESII (‘gagesii’)
GAGESII reference gages (‘gagesii_reference’)
HCDN-2009 (‘hcdn2009’)
CAMELS (‘camels’)
For USGS groundwater wells, the currently-supported set of site networks include:
Climate Response Network (‘climate_response_network’)
Please see the full point module documentation for information on what data is available, our data collection process, and new features we are working on! Our Metadata Description page itemizes the fields that get returned from get_point_metadata.
[1]:
# Import packages
from hf_hydrodata import register_api_pin, get_point_data, get_point_metadata
import pandas as pd
[ ]:
# You need to register on https://hydrogen.princeton.edu/pin
# and run the following with your registered information
# before you can use the hydrodata utilities
register_api_pin("your_email", "your_pin")
Note that get_point_data and get_point_metadata require mandatory parameters of dataset, variable, temporal_resolution, and aggregation (and depth_level if asking for soil moisture data). Please see the documentation for information about what point observation datasets are available and the parameters used to query them.
The hf_hydrodata API Reference includes information on what optional filtering parameters are available. These include filters for things like a geographic region or date range. Those parameters work cumulatively, so if state and site_ids are both supplied, for example, then only sites within site_ids that are also in state will be returned.
Example: Query stream gage data for GAGES-II sites in Colorado
In this example, we are interested in querying the stream gages that are part of the GAGES-II network within the state of Colorado (state = 'CO'). We’ll focus on data within Water Year 2003, so we’ll set date_start='2002-10-01' and date_end='2003-09-30'. Note that we are setting site_networks='gagesii' to get only stream gages that are part of the GAGES-II network.
[2]:
# Get point observations data
data_df = get_point_data(dataset="usgs_nwis", variable="streamflow", temporal_resolution="daily", aggregation="mean",
date_start="2002-10-01", date_end="2003-09-30",
state="CO", site_networks="gagesii")
# View the first five records
data_df.head(5)
[2]:
| date | 06614800 | 06620000 | 06659580 | 06696980 | 06700000 | 06701500 | 06701620 | 06701900 | 06707500 | ... | 09371000 | 09371010 | 09371492 | 09371520 | 09372000 | 393109104464500 | 394308105413800 | 394839104570300 | 401733105392404 | 402114105350101 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2002-10-01 | 0.019810 | 0.97635 | NaN | 0.116879 | NaN | 8.4051 | NaN | 9.5371 | 11.3766 | ... | 0.000000 | 14.1500 | 0.023489 | 0.46978 | 0.274793 | 0.048110 | 0.61977 | 1.26784 | 0.071316 | 0.33677 |
| 1 | 2002-10-02 | 0.021508 | 1.01031 | NaN | 0.148009 | NaN | 8.3485 | NaN | 9.8767 | 12.1690 | ... | 0.066505 | 15.6782 | 0.040469 | 0.53204 | 0.285830 | 0.281868 | 0.81504 | 2.81019 | 0.071316 | 0.39903 |
| 2 | 2002-10-03 | 0.022357 | 1.23388 | NaN | 0.164140 | NaN | 7.1882 | NaN | 8.5466 | 10.2729 | ... | 0.155650 | 19.1874 | 0.091975 | 1.10936 | 0.472610 | 0.249889 | 0.88862 | 1.23954 | 0.069618 | 0.46695 |
| 3 | 2002-10-04 | 0.025753 | 1.81969 | NaN | 0.146877 | NaN | 5.3204 | NaN | 5.9996 | 8.2919 | ... | 0.350920 | 19.6119 | 0.043582 | 0.50657 | 0.894280 | 0.219325 | 0.70184 | 0.68769 | 0.068203 | 0.43299 |
| 4 | 2002-10-05 | 0.024621 | 1.98100 | NaN | 0.143198 | NaN | 4.4997 | NaN | 5.0374 | 6.5090 | ... | 0.060279 | 22.7249 | 0.026885 | 0.47261 | 0.585810 | 0.191591 | 0.64807 | 0.47827 | 0.066788 | 0.43865 |
5 rows × 290 columns
[3]:
# Get site-level attributes for these sites
metadata_df = get_point_metadata(dataset="usgs_nwis", variable="streamflow", temporal_resolution="daily", aggregation="mean",
date_start="2002-10-01", date_end="2003-09-30",
state="CO", site_networks="gagesii")
# View the first five records
metadata_df.head(5)
[3]:
| site_id | site_name | site_type | agency | state | latitude | longitude | first_date_data_available | last_date_data_available | record_count | ... | doi | huc8 | conus1_x | conus1_y | conus2_x | conus2_y | gagesii_drainage_area | gagesii_class | gagesii_site_elevation | usgs_drainage_area | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 06614800 | MICHIGAN RIVER NEAR CAMERON PASS, CO | stream gauge | USGS | CO | 40.496094 | -105.865012 | 1973-10-01 | 2023-12-01 | 18322 | ... | None | 10180001 | 1054 | 818 | 1481 | 1764 | 4.02840 | Ref | 3188.0 | 1.54 |
| 1 | 06620000 | NORTH PLATTE RIVER NEAR NORTHGATE, CO | stream gauge | USGS | CO | 40.936639 | -106.339194 | 1904-06-01 | 2023-12-01 | 39782 | ... | None | 10180001 | 1020 | 870 | 1448 | 1817 | 3702.63700 | Non-ref | 2388.0 | 1431.00 |
| 2 | 06659580 | SAND CREEK AT COLORADO-WYOMING STATE LINE | stream gauge | USGS | CO | 40.993650 | -105.759703 | 1968-10-01 | 2020-09-01 | 10075 | ... | None | 10180010 | nan | nan | 1496 | 1814 | 79.11089 | Non-ref | 2323.0 | 29.20 |
| 3 | 06696980 | TARRYALL CREEK AT UPPER STATION NEAR COMO, CO | stream gauge | USGS | CO | 39.339433 | -105.911681 | 1978-06-01 | 2023-10-13 | 5420 | ... | None | 10190001 | 1036 | 690 | 1466 | 1639 | 61.90650 | Ref | 3040.0 | 23.90 |
| 4 | 06700000 | SOUTH PLATTE RIVER ABOVE CHEESMAN LAKE, CO. | stream gauge | USGS | CO | 39.162769 | -105.310273 | 1924-10-01 | 2023-09-30 | 9523 | ... | None | 10190002 | nan | nan | 1515 | 1617 | 4213.53800 | Non-ref | 2092.0 | 1627.00 |
5 rows × 23 columns
This gives us the data for the 289 Colorado GAGES-II sites that have data within the specified date range.