{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Save and wrangle point observations data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To launch this notebook interactively in a Jupyter notebook-like browser interface, please click the \"Launch Binder\" button below. Note that Binder may take several minutes to launch.\n", "\n", "[](https://mybinder.org/v2/gh/hydroframe/subsettools-binder/HEAD?labpath=hf_hydrodata/point/example_pandas.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The hf_hydrodata `get_point_data` and `get_point_metadata` functions return data in [pandas DataFrames](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html). This notebook goes through some common tasks using pandas, such as saving to a .csv file, saving to a NetCDF file, creating a new variable, and slicing out a particular value. For a more comprehensive introduction to working with data in pandas, please see their [10 minutes to pandas](https://pandas.pydata.org/docs/user_guide/10min.html#min) introduction or [Coming From..](https://pandas.pydata.org/docs/getting_started/index.html#coming-from) documentation to see comparisons to working in R, SQL, Excel, Stata, or SAS.\n", "\n", "Please see the [hf_hydrodata](https://hf-hydrodata.readthedocs.io) documentation for information on what data is available, our data collection process, and new features we are working on! Our [Metadata Description](https://hf-hydrodata.readthedocs.io/en/latest/available_metadata.html#point-observations-metadata) page itemizes the fields that get returned from `get_point_metadata`." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Import packages\n", "import pandas as pd\n", "import xarray as xr\n", "import numpy as np\n", "from hf_hydrodata import register_api_pin, get_point_data, get_point_metadata" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# You need to register on https://hydrogen.princeton.edu/pin \n", "# and run the following with your registered information\n", "# before you can use the hydrodata utilities\n", "register_api_pin(\"your_email\", \"your_pin\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 1: Working with pandas DataFrames" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this first example, we will showcase several common pandas commands that can be used to inspect a DataFrame.\n", "\n", "Note that `get_point_data` and `get_point_metadata` require mandatory parameters of `dataset`, `variable`, `temporal_resolution`, and `aggregation` (and `depth_level` if asking for soil moisture data). Please see [the documentation](https://hf-hydrodata.readthedocs.io/en/latest/available_data.html) for information about what point observation datasets are available and the parameters used to query them. \n", "\n", "The [hf_hydrodata API Reference](https://hf-hydrodata.readthedocs.io/en/latest/hf_hydrodata.point.html) includes information on what optional filtering parameters are available. These include filters for things like a geographic region or date range. Those parameters work cumulatively, so if `state` and `site_ids` are both supplied, for example, then only sites within `site_ids` that are *also* in `state` will be returned." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Let's explore daily streamflow data with optional filters for a date range and bounding box. \n", "\n", "# Get observations data and site-level metadata\n", "data_df = get_point_data(dataset=\"usgs_nwis\", variable=\"streamflow\", temporal_resolution=\"daily\", aggregation=\"mean\",\n", " date_start=\"2002-01-01\", date_end=\"2002-01-05\",\n", " latitude_range=(45, 50), longitude_range=(-75, -50))\n", "\n", "metadata_df = get_point_metadata(dataset=\"usgs_nwis\", variable=\"streamflow\", temporal_resolution=\"daily\", aggregation=\"mean\",\n", " date_start=\"2002-01-01\", date_end=\"2002-01-05\",\n", " latitude_range=(45, 50), longitude_range=(-75, -50))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First we will explore pandas' [head](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html) method for DataFrames. `head` will display the first n rows of the DataFrame, with the default to show the first 5 rows." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | site_id | \n", "site_name | \n", "site_type | \n", "agency | \n", "state | \n", "latitude | \n", "longitude | \n", "first_date_data_available | \n", "last_date_data_available | \n", "record_count | \n", "... | \n", "doi | \n", "huc8 | \n", "conus1_x | \n", "conus1_y | \n", "conus2_x | \n", "conus2_y | \n", "gagesii_drainage_area | \n", "gagesii_class | \n", "gagesii_site_elevation | \n", "usgs_drainage_area | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "01011000 | \n", "Allagash River near Allagash, Maine | \n", "stream gauge | \n", "USGS | \n", "ME | \n", "47.069722 | \n", "-69.079444 | \n", "1910-07-01 | \n", "2023-11-30 | \n", "34028 | \n", "... | \n", "None | \n", "01010002 | \n", "nan | \n", "nan | \n", "4210 | \n", "2783 | \n", "3186.8440 | \n", "Non-ref | \n", "187.0 | \n", "1478.00 | \n", "
| 1 | \n", "01013500 | \n", "Fish River near Fort Kent, Maine | \n", "stream gauge | \n", "USGS | \n", "ME | \n", "47.237500 | \n", "-68.582778 | \n", "1903-07-29 | \n", "2023-12-01 | \n", "36507 | \n", "... | \n", "None | \n", "01010003 | \n", "nan | \n", "nan | \n", "4237 | \n", "2810 | \n", "2252.6960 | \n", "Ref | \n", "157.0 | \n", "873.00 | \n", "
| 2 | \n", "01015800 | \n", "Aroostook River near Masardis, Maine | \n", "stream gauge | \n", "USGS | \n", "ME | \n", "46.523056 | \n", "-68.371667 | \n", "1957-09-14 | \n", "2023-12-01 | \n", "24185 | \n", "... | \n", "None | \n", "01010004 | \n", "nan | \n", "nan | \n", "4276 | \n", "2747 | \n", "2313.7550 | \n", "Non-ref | \n", "166.0 | \n", "892.00 | \n", "
| 3 | \n", "01017000 | \n", "Aroostook River at Washburn, Maine | \n", "stream gauge | \n", "USGS | \n", "ME | \n", "46.777222 | \n", "-68.157222 | \n", "1930-08-01 | \n", "2023-12-01 | \n", "34091 | \n", "... | \n", "None | \n", "01010004 | \n", "nan | \n", "nan | \n", "4281 | \n", "2773 | \n", "4278.9070 | \n", "Non-ref | \n", "131.0 | \n", "1654.00 | \n", "
| 4 | \n", "01017550 | \n", "Williams Brook at Phair, Maine | \n", "stream gauge | \n", "USGS | \n", "ME | \n", "46.628056 | \n", "-67.953056 | \n", "1999-11-01 | \n", "2023-12-01 | \n", "8797 | \n", "... | \n", "None | \n", "01010005 | \n", "nan | \n", "nan | \n", "4300 | \n", "2762 | \n", "10.0323 | \n", "Ref | \n", "176.0 | \n", "3.82 | \n", "
5 rows × 23 columns
\n", "| \n", " | site_id | \n", "site_name | \n", "site_type | \n", "agency | \n", "state | \n", "latitude | \n", "longitude | \n", "first_date_data_available | \n", "last_date_data_available | \n", "record_count | \n", "... | \n", "doi | \n", "huc8 | \n", "conus1_x | \n", "conus1_y | \n", "conus2_x | \n", "conus2_y | \n", "gagesii_drainage_area | \n", "gagesii_class | \n", "gagesii_site_elevation | \n", "usgs_drainage_area | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "01011000 | \n", "Allagash River near Allagash, Maine | \n", "stream gauge | \n", "USGS | \n", "ME | \n", "47.069722 | \n", "-69.079444 | \n", "1910-07-01 | \n", "2023-11-30 | \n", "34028 | \n", "... | \n", "None | \n", "01010002 | \n", "nan | \n", "nan | \n", "4210 | \n", "2783 | \n", "3186.844 | \n", "Non-ref | \n", "187.0 | \n", "1478.0 | \n", "
| 1 | \n", "01013500 | \n", "Fish River near Fort Kent, Maine | \n", "stream gauge | \n", "USGS | \n", "ME | \n", "47.237500 | \n", "-68.582778 | \n", "1903-07-29 | \n", "2023-12-01 | \n", "36507 | \n", "... | \n", "None | \n", "01010003 | \n", "nan | \n", "nan | \n", "4237 | \n", "2810 | \n", "2252.696 | \n", "Ref | \n", "157.0 | \n", "873.0 | \n", "
| 2 | \n", "01015800 | \n", "Aroostook River near Masardis, Maine | \n", "stream gauge | \n", "USGS | \n", "ME | \n", "46.523056 | \n", "-68.371667 | \n", "1957-09-14 | \n", "2023-12-01 | \n", "24185 | \n", "... | \n", "None | \n", "01010004 | \n", "nan | \n", "nan | \n", "4276 | \n", "2747 | \n", "2313.755 | \n", "Non-ref | \n", "166.0 | \n", "892.0 | \n", "
3 rows × 23 columns
\n", "| \n", " | site_id | \n", "usgs_drainage_area | \n", "
|---|---|---|
| 0 | \n", "01011000 | \n", "1478.00 | \n", "
| 1 | \n", "01013500 | \n", "873.00 | \n", "
| 2 | \n", "01015800 | \n", "892.00 | \n", "
| 3 | \n", "01017000 | \n", "1654.00 | \n", "
| 4 | \n", "01017550 | \n", "3.82 | \n", "
| 5 | \n", "01018000 | \n", "175.00 | \n", "
| 6 | \n", "01019000 | \n", "228.30 | \n", "
| 7 | \n", "01027200 | \n", "232.00 | \n", "
| 8 | \n", "01029200 | \n", "173.00 | \n", "
| 9 | \n", "01029500 | \n", "837.00 | \n", "
| 10 | \n", "01030500 | \n", "1418.00 | \n", "
| 11 | \n", "01031300 | \n", "118.00 | \n", "
| 12 | \n", "01031450 | \n", "95.40 | \n", "
| 13 | \n", "01031500 | \n", "298.00 | \n", "
| 14 | \n", "01033000 | \n", "326.00 | \n", "
| 15 | \n", "01034000 | \n", "1162.00 | \n", "
| 16 | \n", "01034500 | \n", "6422.00 | \n", "
| 17 | \n", "01042500 | \n", "1590.00 | \n", "
| 18 | \n", "01043500 | \n", "516.00 | \n", "
| 19 | \n", "01044550 | \n", "193.00 | \n", "
| 20 | \n", "01046000 | \n", "90.00 | \n", "
| 21 | \n", "01046500 | \n", "2715.00 | \n", "
| 22 | \n", "01129200 | \n", "254.00 | \n", "
| 23 | \n", "01010000 | \n", "1341.00 | \n", "
| 24 | \n", "01010070 | \n", "171.00 | \n", "
| 25 | \n", "01010500 | \n", "2680.00 | \n", "
| 26 | \n", "01014000 | \n", "5929.00 | \n", "
| 27 | \n", "01018500 | \n", "413.00 | \n", "
| 28 | \n", "01021000 | \n", "1374.00 | \n", "
| 29 | \n", "04264331 | \n", "298800.00 | \n", "
| 30 | \n", "04294300 | \n", "34.50 | \n", "
| \n", " | date | \n", "01011000 | \n", "01013500 | \n", "01015800 | \n", "01017000 | \n", "01017550 | \n", "01018000 | \n", "01019000 | \n", "01027200 | \n", "01029200 | \n", "... | \n", "01046500 | \n", "01129200 | \n", "01010000 | \n", "01010070 | \n", "01010500 | \n", "01014000 | \n", "01018500 | \n", "01021000 | \n", "04264331 | \n", "04294300 | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "2002-01-01 | \n", "9.7069 | \n", "13.8104 | \n", "12.9048 | \n", "21.3099 | \n", "0.013301 | \n", "NaN | \n", "3.0847 | \n", "1.98666 | \n", "2.43663 | \n", "... | \n", "46.129 | \n", "23.9984 | \n", "11.9143 | \n", "1.48292 | \n", "24.0550 | \n", "61.411 | \n", "9.1126 | \n", "21.9042 | \n", "6084.5 | \n", "0.2547 | \n", "
| 1 | \n", "2002-01-02 | \n", "9.5371 | \n", "13.4142 | \n", "12.0558 | \n", "20.0364 | \n", "0.012169 | \n", "NaN | \n", "3.0564 | \n", "1.91874 | \n", "2.39135 | \n", "... | \n", "46.695 | \n", "23.8286 | \n", "11.6879 | \n", "1.41500 | \n", "23.4890 | \n", "59.713 | \n", "9.0277 | \n", "21.9042 | \n", "6056.2 | \n", "0.2547 | \n", "
| 2 | \n", "2002-01-03 | \n", "9.3390 | \n", "13.0746 | \n", "11.5181 | \n", "19.0742 | \n", "0.011886 | \n", "NaN | \n", "3.0281 | \n", "1.88195 | \n", "2.36305 | \n", "... | \n", "46.978 | \n", "23.8286 | \n", "11.5181 | \n", "1.35840 | \n", "23.0645 | \n", "58.581 | \n", "8.9145 | \n", "21.9042 | \n", "6084.5 | \n", "0.2547 | \n", "
| 3 | \n", "2002-01-04 | \n", "9.1692 | \n", "12.6501 | \n", "11.0936 | \n", "26.4322 | \n", "0.011320 | \n", "NaN | \n", "3.0564 | \n", "1.83667 | \n", "2.34890 | \n", "... | \n", "51.506 | \n", "23.6305 | \n", "11.2917 | \n", "1.31312 | \n", "22.6400 | \n", "57.449 | \n", "8.8579 | \n", "21.9042 | \n", "6056.2 | \n", "0.2547 | \n", "
| 4 | \n", "2002-01-05 | \n", "8.9994 | \n", "12.2822 | \n", "10.6691 | \n", "25.1870 | \n", "0.010754 | \n", "NaN | \n", "3.0281 | \n", "1.79139 | \n", "2.32060 | \n", "... | \n", "37.639 | \n", "23.6022 | \n", "11.0936 | \n", "1.27633 | \n", "22.2155 | \n", "56.317 | \n", "8.7447 | \n", "21.9042 | \n", "5546.8 | \n", "0.2830 | \n", "
5 rows × 32 columns
\n", "<xarray.Dataset>\n",
"Dimensions: (date: 5, site: 31)\n",
"Coordinates:\n",
" * site (site) <U8 '01011000' '01013500' ... '04264331' '04294300'\n",
" * date (date) <U10 '2002-01-01' '2002-01-02' ... '2002-01-05'\n",
"Data variables:\n",
" streamflow (date, site) float64 9.707 13.81 12.9 ... 21.9 5.547e+03 0.283