Filter sites by USGS HUC boundaries
To launch this notebook interactively in a Jupyter notebook-like browser interface, please click the “Launch Binder” button below. Note that Binder may take several minutes to launch.
This notebook provides an example of how to download a HUC boundary shapefile from the USGS and filter observations to only sites that are located within that HUC.
Please see the full point module documentation for information on what data is available, our data collection process, and new features we are working on! Our Metadata Description page itemizes the fields that get returned from get_point_metadata.
[1]:
# Import packages
from hf_hydrodata import register_api_pin, get_point_data, get_point_metadata
import requests
from zipfile import ZipFile
from io import BytesIO
import shapefile
from shapely.geometry import shape
[ ]:
# You need to register on https://hydrogen.princeton.edu/pin
# and run the following with your registered information
# before you can use the hydrodata utilities
register_api_pin("your_email", "your_pin")
Step 1: Download HUC-02 regional shapefile from USGS
The USGS has HUC shapefiles available for download from The National Map (to see what is available, select the Watershed Boundary Dataset). The National Map provides a visual interface for viewing, selecting, and downloading various datasets. These files are available either for the entire nation or by HUC-02 regions. For the purposes of this notebook, we will download a HUC-02 regional shapefile programmatically to allow for full reproducibility. This link provides more detail about the specific files we will be downloading.
[2]:
# Send request for data
url = 'https://prd-tnm.s3.amazonaws.com/StagedProducts/Hydrography/WBD/HU2/Shape/WBD_02_HU2_Shape.zip'
url_response = requests.get(url) # note this might take a minute or so to run
[3]:
# See the names of the files available in the .zip
myzipfile = ZipFile(BytesIO(url_response.content))
print(myzipfile.namelist())
['Shape/ExternalCrosswalk.dbf', 'Shape/FeatureToMetadata.dbf', 'Shape/HUMod.dbf', 'Shape/MetaProcessDetail.dbf', 'Shape/MetaSourceDetail.dbf', 'Shape/NonContributingDrainageArea.dbf', 'Shape/NonContributingDrainageArea.prj', 'Shape/NonContributingDrainageArea.shp', 'Shape/NonContributingDrainageArea.shx', 'Shape/NonContributingDrainageLine.dbf', 'Shape/NonContributingDrainageLine.prj', 'Shape/NonContributingDrainageLine.shp', 'Shape/NonContributingDrainageLine.shx', 'Shape/NWISDrainageArea.dbf', 'Shape/NWISDrainageArea.prj', 'Shape/NWISDrainageArea.shp', 'Shape/NWISDrainageArea.shx', 'Shape/NWISDrainageLine.dbf', 'Shape/NWISDrainageLine.prj', 'Shape/NWISDrainageLine.shp', 'Shape/NWISDrainageLine.shx', 'Shape/ProcessingParameters.dbf', 'Shape/UpdateStatus.dbf', 'Shape/WBDHU10.dbf', 'Shape/WBDHU10.prj', 'Shape/WBDHU10.shp', 'Shape/WBDHU10.shx', 'Shape/WBDHU12.dbf', 'Shape/WBDHU12.prj', 'Shape/WBDHU12.shp', 'Shape/WBDHU12.shx', 'Shape/WBDHU14.dbf', 'Shape/WBDHU14.prj', 'Shape/WBDHU14.shp', 'Shape/WBDHU14.shx', 'Shape/WBDHU16.dbf', 'Shape/WBDHU16.prj', 'Shape/WBDHU16.shp', 'Shape/WBDHU16.shx', 'Shape/WBDHU2.dbf', 'Shape/WBDHU2.prj', 'Shape/WBDHU2.shp', 'Shape/WBDHU2.shx', 'Shape/WBDHU4.dbf', 'Shape/WBDHU4.prj', 'Shape/WBDHU4.shp', 'Shape/WBDHU4.shx', 'Shape/WBDHU6.dbf', 'Shape/WBDHU6.prj', 'Shape/WBDHU6.shp', 'Shape/WBDHU6.shx', 'Shape/WBDHU8.dbf', 'Shape/WBDHU8.prj', 'Shape/WBDHU8.shp', 'Shape/WBDHU8.shx', 'Shape/WBDLine.dbf', 'Shape/WBDLine.prj', 'Shape/WBDLine.shp', 'Shape/WBDLine.shx', 'WBD_02_HU2_Shape.xml', 'WBD_02_HU2_Shape.jpg']
[4]:
# In this example, we will extract only the files with the HUC8 level watersheds
# This code saves these files to the local directory where this notebook is being run
myzipfile.extractall(members=['Shape/WBDHU8.shp', 'Shape/WBDHU8.shx', 'Shape/WBDHU8.dbf', 'Shape/WBDHU8.prj'])
[5]:
# Read in shapefile
huc02_shp = shapefile.Reader('Shape/WBDHU8.shp')
[6]:
# Read in projection file
with open('Shape/WBDHU8.prj') as f:
usgs_huc_crs = f.readlines()[0]
print(f"CRS: {usgs_huc_crs}")
CRS: GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]]
Step 2: Subset shapefile to desired HUC-08 watershed
[7]:
# Let's explore what HUC8 watersheds are within this HUC2 region
for i in range(len(huc02_shp.shapeRecords())):
print(huc02_shp.shapeRecord(i=i, fields=['states', 'huc8', 'name']).record)
Record #0: ['DC,MD,VA,WV', '02070008', 'Middle Potomac-Catoctin']
Record #1: ['DC,MD,VA', '02070010', 'Middle Potomac-Anacostia-Occoquan']
Record #2: ['MD', '02060006', 'Patuxent']
Record #3: ['VA', '02040304', 'Eastern Lower Delmarva']
Record #4: ['NJ,NY', '02040301', 'Mullica-Toms']
Record #5: ['DE,NJ', '02040204', 'Delaware Bay']
Record #6: ['NY', '02020002', 'Sacandaga']
Record #7: ['MD,PA', '02050306', 'Lower Susquehanna']
Record #8: ['MD,PA', '02070009', 'Monocacy']
Record #9: ['VA', '02080103', 'Rapidan-Upper Rappahannock']
Record #10: ['VA', '02080105', 'Mattaponi']
Record #11: ['DE,MD,VA', '02080110', 'Tangier']
Record #12: ['MD,VA,WV', '02070001', 'South Branch Potomac']
Record #13: ['NJ,NY', '02030104', 'Sandy Hook-Staten Island']
Record #14: ['NJ,NY,RI', '02030202', 'Southern Long Island']
Record #15: ['NJ,NY', '02030103', 'Hackensack-Passaic']
Record #16: ['NJ,NY,PA', '02040104', 'Middle Delaware-Mongaup-Brodhead']
Record #17: ['DE,NJ', '02040206', 'Cohansey-Maurice']
Record #18: ['PA', '02050106', 'Upper Susquehanna-Tunkhannock']
Record #19: ['PA', '02050201', 'Upper West Branch Susquehanna']
Record #20: ['PA', '02050202', 'Sinnemahoning']
Record #21: ['PA', '02050203', 'Middle West Branch Susquehanna']
Record #22: ['PA', '02050204', 'Bald Eagle']
Record #23: ['PA', '02050205', 'Pine']
Record #24: ['PA', '02050206', 'Lower West Branch Susquehanna']
Record #25: ['PA', '02050303', 'Raystown']
Record #26: ['PA', '02050304', 'Lower Juniata']
Record #27: ['PA', '02050305', 'Lower Susquehanna-Swatara']
Record #28: ['NJ,PA', '02040201', 'Crosswicks-Neshaminy']
Record #29: ['NJ,PA', '02040105', 'Middle Delaware-Musconetcong']
Record #30: ['NY', '02030201', 'Northern Long Island']
Record #31: ['NY', '02040102', 'East Branch Delaware']
Record #32: ['NY,PA', '02050104', 'Tioga']
Record #33: ['DE,MD,VA', '02080111', 'Pokomoke-Western Lower Delmarva']
Record #34: ['DE,MD,NJ,VA', '02040303', 'Chincoteague']
Record #35: ['NJ', '02040302', 'Great Egg Harbor']
Record #36: ['VA,WV', '02070005', 'South Fork Shenandoah']
Record #37: ['VA', '02080104', 'Lower Rappahannock']
Record #38: ['VA', '02080106', 'Pamunkey']
Record #39: ['VA', '02080107', 'York']
Record #40: ['VA', '02080204', 'Rivanna']
Record #41: ['VA', '02080205', 'Middle James-Willis']
Record #42: ['MD', '02060001', 'Upper Chesapeake Bay']
Record #43: ['MD,PA', '02060003', 'Gunpowder-Patapsco']
Record #44: ['MD', '02060004', 'Severn']
Record #45: ['DE,MD', '02060005', 'Choptank']
Record #46: ['DE,MD', '02080109', 'Nanticoke']
Record #47: ['MD,PA,WV', '02070002', 'North Branch Potomac']
Record #48: ['DE', '02040207', 'Broadkill-Smyrna']
Record #49: ['MD,VA,WV', '02070007', 'Shenandoah']
Record #50: ['MD,PA,VA,WV', '02070004', 'Conococheague-Opequon']
Record #51: ['MD,VA', '02080102', 'Great Wicomico-Piankatank']
Record #52: ['MD,VA', '02070011', 'Lower Potomac']
Record #53: ['DE,NJ,PA', '02040202', 'Lower Delaware']
Record #54: ['PA', '02050302', 'Upper Juniata']
Record #55: ['VA,WV', '02070006', 'North Fork Shenandoah']
Record #56: ['VA,WV', '02080201', 'Upper James']
Record #57: ['VA', '02080202', 'Maury']
Record #58: ['VA', '02080203', 'Middle James-Buffalo']
Record #59: ['VA', '02080207', 'Appomattox']
Record #60: ['MD,PA,VA,WV', '02070003', 'Cacapon-Town']
Record #61: ['VA', '02080208', 'Hampton Roads']
Record #62: ['MD,VA', '02080101', 'Lower Chesapeake Bay']
Record #63: ['NJ,NY', '02020007', 'Rondout']
Record #64: ['NY', '02050102', 'Chenango']
Record #65: ['MA,NY,VT', '02020003', 'Hudson-Hoosic']
Record #66: ['NY', '02020004', 'Mohawk']
Record #67: ['NY', '02020001', 'Upper Hudson']
Record #68: ['NY', '02020008', 'Hudson-Wappinger']
Record #69: ['CT,NJ,NY', '02030101', 'Lower Hudson']
Record #70: ['MA,NY', '02020006', 'Middle Hudson']
Record #71: ['NY', '02020005', 'Schoharie']
Record #72: ['NJ', '02030105', 'Raritan']
Record #73: ['PA', '02040106', 'Lehigh']
Record #74: ['VA', '02080206', 'Lower James']
Record #75: ['VA', '02080108', 'Lynnhaven-Poquoson']
Record #76: ['NY,PA', '02050101', 'Upper Susquehanna']
Record #77: ['DE,MD,PA', '02040205', 'Brandywine-Christina']
Record #78: ['DE,MD,PA', '02060002', 'Chester-Sassafras']
Record #79: ['PA', '02050301', 'Lower Susquehanna-Penns']
Record #80: ['PA', '02050107', 'Upper Susquehanna-Lackawanna']
Record #81: ['PA', '02040103', 'Lackawaxen']
Record #82: ['CT,NY', '02030102', 'Bronx']
Record #83: ['PA', '02040203', 'Schuylkill']
Record #84: ['NY,PA', '02040101', 'Upper Delaware']
Record #85: ['NY,PA', '02050103', 'Owego-Wappasening']
Record #86: ['NY,PA', '02050105', 'Chemung']
Record #87: ['CT,NY,RI', '02030203', 'Long Island Sound']
[8]:
# We want to use the Raritan watershed, HUC8='02030105'. This is at index 72
print(huc02_shp.shapeRecord(i=72, fields=['states', 'huc8', 'name']).record)
# Extract the shape and record information for this index
raritan_shape = huc02_shp.shapeRecord(i=72).shape
raritan_record = huc02_shp.shapeRecord(i=72).record
Record #72: ['NJ', '02030105', 'Raritan']
[9]:
# Display the shape of the selected watershed
raritan_geo = huc02_shp.shapeRecord(i=72).shape.__geo_interface__
shape(raritan_geo)
[9]:
[10]:
# Save as shapefile, to be passed in to hf_hydrodata functions below
with shapefile.Writer('Shape/raritan_watershed') as w:
w.fields = huc02_shp.fields[1:]
w.record(raritan_record)
w.shape(raritan_shape)
Step 3: Request streamflow data for sites within the watershed
[11]:
# Request point observations data
data_df = get_point_data(dataset="usgs_nwis", variable="streamflow", temporal_resolution="daily", aggregation="mean",
date_start="2021-10-01", date_end="2022-09-30",
polygon="Shape/raritan_watershed.shp", polygon_crs=usgs_huc_crs)
# View first five records
data_df.head(5)
[11]:
| date | 01396500 | 01396582 | 01396660 | 01396800 | 01397000 | 01398000 | 01398500 | 01399100 | 01399500 | ... | 01402000 | 01403060 | 01403150 | 01403400 | 01403540 | 01403900 | 01405030 | 01405400 | 01406050 | 01402630 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2021-10-01 | 2.37437 | 0.067354 | 0.238569 | 0.98201 | 4.6978 | 0.260926 | 1.16596 | 0.196685 | 1.46311 | ... | 3.9054 | 11.9143 | 0.006792 | 0.050657 | 0.086032 | 0.94805 | 0.58015 | 0.75844 | 0.194704 | 0.053204 |
| 1 | 2021-10-02 | 2.26400 | 0.062260 | 0.231777 | 0.90560 | 4.4997 | 0.238569 | 1.17162 | 0.157348 | 1.32161 | ... | NaN | 11.2068 | 0.007075 | 0.047827 | 0.085749 | 0.91409 | 0.57449 | 0.71316 | 0.179422 | 0.050940 |
| 2 | 2021-10-03 | 2.20174 | 0.057449 | 0.229513 | 0.89145 | 4.3582 | 0.227532 | 1.17445 | 0.153669 | 1.27067 | ... | 3.3960 | 10.1314 | 0.006226 | 0.047261 | 0.087730 | 0.89711 | 0.55751 | 0.68486 | 0.168668 | 0.051223 |
| 3 | 2021-10-04 | 4.24500 | 1.044270 | 0.444310 | 1.03012 | 5.2355 | 0.240833 | 2.66586 | 0.461290 | 1.79705 | ... | NaN | 11.8294 | 0.035658 | 0.164989 | 0.245361 | 2.39135 | 0.56883 | 0.80938 | 0.223287 | 0.128765 |
| 4 | 2021-10-05 | 4.44310 | 0.506570 | 0.328280 | 1.05276 | 8.2636 | 0.258662 | 1.74328 | 0.399030 | 1.94421 | ... | 3.7073 | 22.2438 | 0.018395 | 0.105276 | 0.151122 | 2.11684 | 0.62826 | 1.12351 | 0.549020 | 0.102446 |
5 rows × 25 columns
[12]:
# Request site-level attributes for these sites
metadata_df = get_point_metadata(dataset="usgs_nwis", variable="streamflow", temporal_resolution="daily", aggregation="mean",
date_start="2021-10-01", date_end="2022-09-30",
polygon="Shape/raritan_watershed.shp", polygon_crs=usgs_huc_crs)
# View first five records
metadata_df.head(5)
[12]:
| site_id | site_name | site_type | agency | state | latitude | longitude | first_date_data_available | last_date_data_available | record_count | ... | doi | huc8 | conus1_x | conus1_y | conus2_x | conus2_y | gagesii_drainage_area | gagesii_class | gagesii_site_elevation | usgs_drainage_area | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 01396500 | South Branch Raritan River near High Bridge NJ | stream gauge | USGS | NJ | 40.677778 | -74.879167 | 1918-10-01 | 2023-12-02 | 38414 | ... | None | 02030105 | nan | nan | 3993 | 1995 | 162.52470 | Non-ref | 93.0 | 65.3 |
| 1 | 01396582 | Spruce Run at Main Street at Glen Gardner NJ | stream gauge | USGS | NJ | 40.691389 | -74.936944 | 1978-03-24 | 2023-12-02 | 15555 | ... | None | 02030105 | nan | nan | 3986 | 1993 | 32.08680 | Non-ref | 117.0 | 12.3 |
| 2 | 01396660 | Mulhockaway Creek at Van Syckel NJ | stream gauge | USGS | NJ | 40.647500 | -74.968889 | 1977-07-29 | 2023-12-02 | 16927 | ... | None | 02030105 | nan | nan | 3987 | 1985 | 30.44061 | Non-ref | 85.0 | 11.8 |
| 3 | 01396800 | Spruce Run at Clinton NJ | stream gauge | USGS | NJ | 40.640000 | -74.915556 | 1960-10-01 | 2023-12-02 | 23071 | ... | None | 02030105 | nan | nan | 3990 | 1985 | 107.84340 | Non-ref | 67.0 | 41.3 |
| 4 | 01397000 | South Branch Raritan River at Stanton NJ | stream gauge | USGS | NJ | 40.572222 | -74.868056 | 1903-10-01 | 2023-12-02 | 39145 | ... | None | 02030105 | nan | nan | 3997 | 1979 | 388.62720 | Non-ref | 40.0 | 147.0 |
5 rows × 23 columns