hf_hydrodata.gridded module

Functions to access gridded data from the data catalog index of the GPFS files.

hf_hydrodata.gridded.get_date_range(*args, **kwargs) → Tuple[datetime, datetime]

Get the date range of the dataset specified by the options.

The parameters to the function can be specified either by passing a dict with the parameter values or by passing named parameters to the function. You can pass any parameters used by get_numpy() or get_data_catalog_entry(), but only the dataset option is used.

Parameters:: dataset -- A dataset name (see Gridded Data documentation).
Returns:: A tuple with (dataset_start_date, dataset_end_date) or None if no date range is available.

Example:

import hf_hydrodata as hf

options = {"dataset": "NLDAS2", "temporal_resolution": "daily", "variable": "precipitation",
           "start_time":"2005-09-30", "end_time":"2005-10-03",
           "grid_bounds":[200, 200, 300, 250]
}
range = hf.get_date_range(options)
assert range[0] == datetime.datetime(2002, 10, 1)
assert range[1] == datetime.datetime(2006, 9, 30)

hf_hydrodata.gridded.get_file_path(entry, *args, **kwargs) → str

This function is deprecated.

Use the function get_path() instead.

hf_hydrodata.gridded.get_file_paths(entry, *args, **kwargs) → List[str]

This function is deprecated.

Use the function get_paths() instead.

hf_hydrodata.gridded.get_gridded_data(*args, **kwargs) → ndarray

Get a numpy ndarray from files in /hydroframe. with the applied data filters.

The parameters to the function can be specified either by passing a dict with the parameter values or by passing named parameters to the function.

Parameters:

dataset -- A dataset name (see Gridded Data documentation).
variable -- A variable from a dataset.
temporal_resolution -- Time resolution of a the dataset variable. Must be hourly, daily, weekly, monthly.
grid -- A grid supported by a dataset (e.g. conus1 or conus2). Normally this is determined by the dataset.
aggregation -- One of mean, max, min. Normally, only needed for temperature variables.
start_time -- A time as either a datetime object or a string in the form YYYY-MM-DD. Start of the date range for data.
end_time -- A time as either a datetime object or a string in the form YYYY-MM-DD. End of the date range for data.
grid_bounds -- An array (or string representing an array) of points [left, bottom, right, top] in xy grid corridates in the grid of the data.
latlng_bounds -- An array (or string representing an array) of points [left, bottom, right, top] in lat/lng coordinates mapped with the grid of the data.
grid_point -- An array (or string representing an array) of points [x, y] in grid corridates of a point in the grid.
latlng_point -- An array (or string representing an array) of points [lat, lon] in lat/lng coordinates of a point in the grid.
huc_id -- A comma seperated list of HUC id that specifies the grid_bounds using the HUC bounding box.
z -- A value of the z dimension to be used as a filter for this dismension when loading data.
level -- A HUC level integer when reading HUC boundary files. Must be 2, 4, 6, 8, or 10.
site_id -- Used when reading data associated with an observation site.
time_values -- Optional. An empty array that will be populated with time dimension values of returned data.

Returns:

A numpy ndarray containing the data loaded from the files identified by the entry and sliced by the data filter options.

Raises:

ValueError -- If both grid_bounds and latlng_bounds are specified as data filters.
ValueError -- If no data catalog entry is found associated with the filter parameters.
ValueError -- If any filter parameters are invalid.

For gridded results the returned numpy array has dimensions:

[hour, y, x] temporal_resolution is hourly without z dimension
[day, y, x] temporal_resolution is daily without z dimension
[month, y, x] temporal_resolution is monthly without z dimension
[y, x] temporal_resolution is static or blank without z dimension
[hour, z, y, x] temporal_resolution is hourly with z dimension
[day, z, y, x] temporal_resolution is daily with z dimension
[month, z, y, x] temporal_resolution is monthly with z dimension
[z, y, x] temporal_resolution is static or blank with z dimension

If the dataset has ensembles then there is an ensemble dimension at the beginning.

Both start_time and end_time must be in the form "YYYY-MM-DD HH:MM:SS" or "YYYY-MM-DD" or a datetime object.

If only start_time is specified than only that month/day/hour is returned. The start_time is inclusive the end_time is exclusive (data returned less than that time).

If either grid_bounds or latlng_bounds is specified then the result is sliced by the x,y values in the bounds. If grid_point or latlon_point is specified this is mapped to a grid_bounds of size 1x1 at that point.

If z is specified then the result is sliced by the z dimension.

For example, to get data from the 3 daily files bewteen 9/30/2005 and 10/3/2005.

Example:

import hf_hydrodata as hf

options = {
    "dataset": "NLDAS2", "temporal_resolution": "daily", "variable": "precipitation",
    "start_time":"2005-09-30", "end_time":"2005-10-03",
    "grid_bounds":[200, 200, 300, 250]
}
# The result has 3 days in the time dimension
# The result is sliced to x,y size 100x50 in the conus1 grid.
data = hf.get_numpy(options)
assert data.shape == (3, 50, 100)

metadata = dc.get_catalog_entry(options)

hf_hydrodata.gridded.get_gridded_files(options: dict, filename_template: str = None, variables=None, verbose=False)

Get data from the hydrodata catalog and save into multiple files in the current directory. This allows you to perform large downloads using multiple threads into multiple files with one function call.

Files are saved to the current directory. A seperate file is created for each day of data downloaded for daily or hourly temporal_resolution. A seperate file is created for each water year of a monthly file. For static temporal_resolution one file per variable is created.

The extension of the filename_template determines the file format. Only extensions .pfb, .tiff, or .nc are supported at this time. For tiff files only the first time period of the selected data is saved in the file since a tiff is 2D.

The default filename_template saves data as pfb files:

hourly: {dataset}.{dataset_var}.{hour_start:06d}_to_{hour_end:06d}.pfb
daily: {dataset}.{dataset_var}.daily.{aggregation}.{daynum:03d}.pfb
monthly: {dataset}.{dataset_var}.monthly.{aggregation}.WY{wy}.pfb
static: {dataset}.{variable}.pfb

If you explicitly specify a filename_template that will be used instead. If filename_template contains a directory path then that directory will be created if it does not exist.

Parameters:

options -- A dict containing data filters to be passed to get_gridded_data().
filename_template -- A template used to create the file name(s) to store the data downloaded.
variables -- A list of variable names to download. If provided, this overwrites the variable defined in options dict.
verbose -- If True, prints progress of downloaded data while downloading.

Raises:

ValueError -- If an error occurs while downloading and creating files.

The following parameters are substituted into the filename_template.

dataset: The dataset name from the options.
variable: The variable being downloaded from options or the variables list.
dataset_var: The data catalog entry dataset_var of the entry determined by options.
hour_start: The starting hour of the data in the saved file. Starting with 0.
hour_end: The ending hour of the data in the saved file.
daynum: The day number of the data in the saved file. Starting with 0.
wy: The water year of the data in the saved file.
wy_daynum: The day number of the water year of the data in the saved file.
wy_start_24hr:The 24 hour start hour of the water year of the data in the saved file.
mdy: The date as month day year of data in the saved file.
ymd: The date as year month day of data in the saved file.

Example:

import hf_hydrodata as hf

options = {
    "dataset": "NLDAS2", "temporal_resolution": "hourly", "variable": "precipitation",
    "start_time":"2005-10-01", "end_time":"2005-10-04",
    "grid_bounds":[200, 200, 300, 250]
}

# By default this creates pfb files named with the time dimension starting at 0.
hf.get_gridded_files(options)

# The above function call will create files in the current directory named:
#    NLDAS2.precipitation.000000_to_000024.pfb
#    NLDAS2.precipitation.000025_to_000048.pfb
#    NLDAS2.precipitation.000048_to_000072.pfb
# Each pfb file is shape (24, 50, 100) with dimensions time, y, x.

# To download data into a NetCdf file specify a filename_template ending with .nc
hf.get_gridded_files(
    options,
    filename_template="{dataset}_WY{wy}.nc",
    variables=["precipitation", "air_temp"])

# The above function call will create a netcdf file named:
#    NLDAS2_WY2006.nc
# The .nc file will have two variables: precipitation and air_temp.
# The .nc file will have a time dimension with coordinates between 2005-10-1 to 2005-10-04.
# The .nc file with have x dimension 100 and y dimensions 50 defined by the grid_bounds.

# To download data into a GeoTiff file specify a filename_template ending with .tiff
hf.get_gridded_files(
    options,
    filename_template="{dataset}_{variable}.tiff",
    variables=["precipitation", "air_temp"])

# The above function will create two GeoTiff files:
#   NLDAS2.precipitation.tiff
#   NLDAS2.air_temp.tiff
# Only the first hour of the period "2005-10-01 00:00:00" is used to create the files.
# The files will contain projection information suitable to view with GIS.

For long downloads if the function execution is aborted before completion it can be restarted and will continue where it left off by skipping files that already exist. To re-download data, remember to delete previously created files first.

hf_hydrodata.gridded.get_huc_bbox(grid: str, huc_id_list: List[str]) → List[int]

Get the grid bounding box containing all the HUC ids.

Parameters:

grid -- A grid id from the data catalog (e.g. conus1 or conus2)
huc_id_list -- A list of HUC id strings of HUCs in the grid.

Returns:

A bounding box in grid coordinates as a list of int (i_min, j_min, i_max, j_max)

Raises:

ValueError -- if all the HUC id are not at the same level (same length).
ValueError -- if grid is not valid.

Example:

import hf_hydrodata as hf

bbox = hf.get_huc_bbox("conus1", ["181001"])
assert bbox == (1, 167, 180, 378)

hf_hydrodata.gridded.get_huc_from_latlon(grid: str, level: int, lat: float, lon: float) → str

Get a HUC id at a lat/lon point for a given grid and level.

Parameters:

grid -- grid name (e.g. conus1 or conus2)
level -- HUC level (length of HUC id to be returned). Must be 2, 4, 6, 8, or 10.
lat -- lattitude of point
lon -- longitude of point

Returns:

The HUC id string containing the lat/lon point or None.

Example:

import hf_hydrodata as hf

huc_id = hf.get_huc_from_latlon("conus1", 6, 34.48, -115.63)
assert huc_id == "181001"

hf_hydrodata.gridded.get_huc_from_xy(grid: str, level: int, x: int, y: int) → str

Get a HUC id at an xy point for a given grid and level.

Parameters:

grid -- grid name (e.g. conus1 or conus2)
level -- HUC level (length of HUC id to be returned). Must be 2, 4, 6, 8, or 10.
x -- x coordinate in the grid
y -- y coordinate in the grid

Returns:

The HUC id string containing the lat/lon point or None.

Example:

import hf_hydrodata as hf

huc_id = hf.get_huc_from_xy("conus1", 6, 300, 100)
assert huc_id == "181001"

hf_hydrodata.gridded.get_ndarray(entry, *args, **kwargs) → ndarray

Deprecated.

Use get_numpy() instead.

hf_hydrodata.gridded.get_numpy(*args, **kwargs) → ndarray

Deprecated. Use get_gridded_data() instead.

Get a numpy ndarray from files in /hydroframe. with the applied data filters.

The parameters to the function can be specified either by passing a dict with the parameter values or by passing named parameters to the function.

Parameters:

dataset -- A dataset name (see Gridded Data documentation).
variable -- A variable from a dataset.
temporal_resolution -- Time resolution of a the dataset variable. Must be hourly, daily, weekly, monthly.
grid -- A grid supported by a dataset (e.g. conus1 or conus2). Normally this is determined by the dataset.
aggregation -- One of mean, max, min. Normally, only needed for temperature variables.
start_time -- A time as either a datetime object or a string in the form YYYY-MM-DD. Start of the date range for data.
end_time -- A time as either a datetime object or a string in the form YYYY-MM-DD. End of the date range for data.
grid_bounds -- An array (or string representing an array) of points [left, bottom, right, top] in xy grid corridates in the grid of the data.
latlng_bounds -- An array (or string representing an array) of points [left, bottom, right, top] in lat/lng coordinates mapped with the grid of the data.
grid_point -- An array (or string representing an array) of points [x, y] in grid corridates of a point in the grid.
latlng_point -- An array (or string representing an array) of points [lat, lon] in lat/lng coordinates of a point in the grid.
huc_id -- A comma seperated list of HUC id that specifies the grid_bounds using the HUC bounding box.
z -- A value of the z dimension to be used as a filter for this dismension when loading data.
level -- A HUC level integer when reading HUC boundary files. Must be 2, 4, 6, 8, or 10.
site_id -- Used when reading data associated with an observation site.
time_values -- Optional. An empty array that will be populated with time dimension values of returned data.

Returns:

A numpy ndarray containing the data loaded from the files identified by the entry and sliced by the data filter options.

Raises:

ValueError -- If both grid_bounds and latlng_bounds are specified as data filters.
ValueError -- If no data catalog entry is found associated with the filter parameters.
ValueError -- If any filter parameters are invalid.

For gridded results the returned numpy array has dimensions:

[hour, y, x] temporal_resolution is hourly without z dimension
[day, y, x] temporal_resolution is daily without z dimension
[month, y, x] temporal_resolution is monthly without z dimension
[y, x] temporal_resolution is static or blank without z dimension
[hour, z, y, x] temporal_resolution is hourly with z dimension
[day, z, y, x] temporal_resolution is daily with z dimension
[month, z, y, x] temporal_resolution is monthly with z dimension
[z, y, x] temporal_resolution is static or blank with z dimension

If the dataset has ensembles then there is an ensemble dimension at the beginning.

Both start_time and end_time must be in the form "YYYY-MM-DD HH:MM:SS" or "YYYY-MM-DD" or a datetime object.

If only start_time is specified than only that month/day/hour is returned. The start_time is inclusive the end_time is exclusive (data returned less than that time).

If either grid_bounds or latlng_bounds is specified then the result is sliced by the x,y values in the bounds. If grid_point or latlon_point is specified this is mapped to a grid_bounds of size 1x1 at that point.

If z is specified then the result is sliced by the z dimension.

For example, to get data from the 3 daily files bewteen 9/30/2005 and 10/3/2005.

Example:

import hf_hydrodata as hf

options = {
    "dataset": "NLDAS2", "temporal_resolution": "daily", "variable": "precipitation",
    "start_time":"2005-09-30", "end_time":"2005-10-03",
    "grid_bounds":[200, 200, 300, 250]
}
# The result has 3 days in the time dimension
# The result is sliced to x,y size 100x50 in the conus1 grid.
data = hf.get_numpy(options)
assert data.shape == (3, 50, 100)

metadata = dc.get_catalog_entry(options)

hf_hydrodata.gridded.get_path(*args, **kwargs) → str

Get the file path within data catalog for the filter options.

The parameters to the function can be specified either by passing a dict with the parameter values or by passing named parameters to the function.

Parameters:

dataset -- A dataset name (see Gridded Data documentation).
variable -- A variable from a dataset.
temporal_resolution -- Time resolution of a the dataset variable. Must be hourly, daily, weekly, monthly.
grid -- A grid supported by a dataset (e.g. conus1 or conus2). Normally this is determined by the dataset.
aggregation -- One of mean, max, min. Normally, only needed for temperature variables.
start_time -- A time as either a datetime object or a string in the form YYYY-MM-DD. Start of the date range for data.
end_time -- A time as either a datetime object or a string in the form YYYY-MM-DD. End of the date range for data.
grid_bounds -- An array (or string representing an array) of points [left, bottom, right, top] in xy grid corridates in the grid of the data.
latlng_bounds -- An array (or string representing an array) of points [left, bottom, right, top] in lat/lng coordinates mapped with the grid of the data.
grid_point -- An array (or string representing an array) of points [x, y] in grid corridates of a point in the grid.
latlng_point -- An array (or string representing an array) of points [lat, lon] in lat/lng coordinates of a point in the grid.
huc_id -- A comma seperated list of HUC id that specifies the grid_bounds using the HUC bounding box.
z -- A value of the z dimension to be used as a filter for this dismension when loading data.
level -- A HUC level integer when reading HUC boundary files. Must be 2, 4, 6, 8, or 10.
site_id -- Used when reading data associated with an observation site.

Returns:

An absolute path name to the file location on the GPFS file system.

Raises:

ValueError If no data data catalog entry is found for the filter options provided. --

Example:

import hf_hydrodata as hf

options = {
    "dataset": "NLDAS2", "temporal_resolution": "daily", "variable": "precipitation",
    "start_time":"2005-09-30"
}
path = hf.get_path(options)

hf_hydrodata.gridded.get_paths(*args, **kwargs) → List[str]

Get the file paths within data catalog for the filter options.

The parameters to the function can be specified either by passing a dict with the parameter values or by passing named parameters to the function.

Parameters:

dataset -- A dataset name (see Gridded Data documentation).
variable -- A variable from a dataset.
temporal_resolution -- Time resolution of a the dataset variable. Must be hourly, daily, weekly, monthly.
grid -- A grid supported by a dataset (e.g. conus1 or conus2). Normally this is determined by the dataset.
aggregation -- One of mean, max, min. Normally, only needed for temperature variables.
start_time -- A time as either a datetime object or a string in the form YYYY-MM-DD. Start of the date range for data.
end_time -- A time as either a datetime object or a string in the form YYYY-MM-DD. End of the date range for data.
grid_bounds -- An array (or string representing an array) of points [left, bottom, right, top] in xy grid corridates in the grid of the data.
latlng_bounds -- An array (or string representing an array) of points [left, bottom, right, top] in lat/lng coordinates mapped with the grid of the data.
grid_point -- An array (or string representing an array) of points [x, y] in grid corridates of a point in the grid.
latlng_point -- An array (or string representing an array) of points [lat, lon] in lat/lng coordinates of a point in the grid.
huc_id -- A comma seperated list of HUC id that specifies the grid_bounds using the HUC bounding box.
z -- A value of the z dimension to be used as a filter for this dismension when loading data.
level -- A HUC level integer when reading HUC boundary files. Must be 2, 4, 6, 8, or 10.
site_id -- Used when reading data associated with an observation site.

Returns:

An list of absolute path names to the file location on the GPFS file system.

Raises:

ValueError -- If no data data catalog entry is found for the filter options provided.

Example:

import hf_hydrodata as hf

options = {
    "dataset": "NLDAS2", "temporal_resolution": "daily", "variable": "precipitation",
     "start_time":"2005-09-30", "end_time": "2005-10-3"
}
paths = hf.get_paths(options)
assert len(paths) == 5    # 5 days

hf_hydrodata.gridded.get_raw_file(filepath, *args, **kwargs)

Get the hydroframe file that is selected by the options to the given filepath.

Parameters:

filepath -- Either a ModelTableRow or the ID number of a data_catalog_entry. If None use the entry found by the filters.
options -- Optional positional parameter that must be a dict with data filter options.

Returns:

None

Raises:

ValueError -- If there are multiple paths selected from hydroframe.

Example:

import hf_hydrodata as hf

options = {
    "dataset": "huc_mapping", "grid": "conus2", "level": "4"}
}
hf.get_raw_file("huc4.tiff", options)