hf_hydrodata.data_catalog module

Functions to access data_catalog metadata.

hf_hydrodata.data_catalog.get_catalog_entries(*args, **kwargs) → List[ModelTableRow]

Get data catalog entry rows selected by filter options.

The parameters to the function can be specified either by passing a dict with the parameter values or by passing named parameters to the function.

Parameters:

dataset -- A dataset name (see Gridded Data documentation).
variable -- A variable from a dataset.
temporal_resolution -- The temporal_resolution (e.g. hourly, daily, weekly, monthly) of a dataset variable.
grid -- A grid supported by a dataset (e.g. conus1 or conus2). Normally this is determined by the dataset.
aggregation -- One of mean, max, min. Normally, only needed for temperature variables.
date_start -- A time as either a datetime object or a string in the form YYYY-MM-DD. Start of the date range for data. (start_time is also accepted for backward compatibility)
date_end -- A time as either a datetime object or a string in the form YYYY-MM-DD. End of the date range for data. (end_time is also accepted for backward compatibility)
grid_bounds -- An array (or string representing an array) of points [left, bottom, right, top] in xy grid corridates in the grid of the data.
latlng_bounds -- An array (or string representing an array) of points [left, bottom, right, top] in lat/lng coordinates mapped with the grid of the data.
grid_point -- An array (or string representing an array) of points [x, y] in grid corridates of a point in the grid.
latlng_point -- An array (or string representing an array) of points [lat, long] in lat/lng coordinates of a point in the grid.
z -- A value of the z dimension to be used as a filter for this dismension when loading data.
level -- A HUC level integer when reading HUC boundary files.
site_id -- Used when reading data associated with an observation site.
data_catalog_entry_id -- Optional. The id of an entry in the data catalog to identify an entry.

Returns:

A list of ModelTableRow entries that match the filter options.

A ModelTableRow contains the attributes of the hf_hydrodata model of a data catalog entry. The attributes can be accessed by indexing by the attribute name (e.g. entry["dataset"]). You can get the attribute names of an entry using column_names() (e.g. entry.column_names()).

ModelTableRow metadata attributes:

dataset: A dataset name (see Gridded Data documentation).
variable: A variable from a dataset.
temporal_resolution: The temporal_resolution (e.g. hourly, daily, weekly, monthly) of a dataset variable.
grid: A grid supported by a dataset (e.g. conus1 or conus2). Normally this is determined by the dataset.
aggregation: One of mean, max, min. Normally, only needed for temperature variables.
entry_start_date: Earliest available date of data.
entry_end_date: Latest available date of data.
units: Units of the data.
file_type: Type of file in hf_hydrodata GPFS.
dataset_type: A classification type of the dataset.
paper_dois: A space seperate list of DOI references to published papers.
structure_type: Structure of the data: gridded or point.
description: Short description of the dataset containing the data.
summary: Longer summary of the dataset containing the data.
id: The unique id of the entry in the data catalog.

Example:

import hf_hydrodata as hf

entries = hf.get_catalog_entries(dataset="NLDAS2", temporal_resolution="daily")

options = {"dataset": "NLDAS2", "temporal_resolution": "daily"}
entries = hf.get_catalog_entries(options)
assert len(entries) == 20
entry = entries[0]
assert entry["dataset"] == "NLDAS2"

hf_hydrodata.data_catalog.get_catalog_entry(*args, **kwargs) → ModelTableRow

Get a single data catalog entry row selected by filter options.

The parameters to the function can be specified either by passing a dict with the parameter values or by passing named parameters to the function.

Parameters:

dataset -- A dataset name (see Gridded Data documentation).
variable -- A variable from a dataset.
temporal_resolution -- The temporal_resolution (e.g. hourly, daily, weekly, monthly) of a dataset variable.
grid -- A grid supported by a dataset (e.g. conus1 or conus2). Normally this is determined by the dataset.
aggregation -- One of mean, max, min. Normally, only needed for temperature variables.
date_start -- A time as either a datetime object or a string in the form YYYY-MM-DD. Start of the date range for data. (start_time is also accepted for backward compatibility)
date_end -- A time as either a datetime object or a string in the form YYYY-MM-DD. End of the date range for data. (end_time is also accepted for backward compatibility)
grid_bounds -- An array (or string representing an array) of points [left, bottom, right, top] in xy grid corridates in the grid of the data.
latlng_bounds -- An array (or string representing an array) of points [left, bottom, right, top] in lat/lng coordinates mapped with the grid of the data.
grid_point -- An array (or string representing an array) of points [x, y] in grid corridates of a point in the grid.
latlng_point -- An array (or string representing an array) of points [lat, lon] in lat/lng coordinates of a point in the grid.
z -- A value of the z dimension to be used as a filter for this dismension when loading data.
level -- A HUC level integer when reading HUC boundary files.
site_id -- Used when reading data associated with an observation site.
data_catalog_entry_id -- Optional. The id of an entry in the data catalog to identify an entry.

Returns:

A single ModelTableRow entry that match the filter options or None if no entry is found.

Raises:

ValueError -- If the filter options do not uniquely identify a single entry.

A ModelTableRow contains the attributes of the hf_hydrodata model of a data catalog entry. The attributes can be accessed by indexing by the attribute name (e.g. entry["dataset"]). You can get the attribute names of an entry using column_names() (e.g. entry.column_names()).

ModelTableRow metadata attributes:

dataset: A dataset name (see Gridded Data documentation).
variable: A variable from a dataset.
temporal_resolution: The temporal_resolution (e.g. hourly, daily, weekly, monthly) of a dataset variable.
grid: A grid supported by a dataset (e.g. conus1 or conus2). Normally this is determined by the dataset.
aggregation: One of mean, max, min. Normally, only needed for temperature variables.
entry_start_date: Earliest available date of data.
entry_end_date: Latest available date of data.
units: Units of the data.
file_type: Type of file in hf_hydrodata GPFS.
dataset_type: A classification type of the dataset.
paper_dois: A space seperate list of DOI references to published papers.
structure_type: Structure of the data: gridded or point.
description: Short description of the dataset containing the data.
summary Longer summary of the dataset containing the data.
id: The unique id of the entry in the data catalog.

Example:

import hf_hydrodata as hf

options = {
    "dataset": "NLDAS2", "temporal_resolution": "daily",
    "variable": "precipitation", "date_start": "2005-7-1"
}
entry = hf.get_catalog_entry(options)

hf_hydrodata.data_catalog.get_citations(*args, **kwargs) → str

Get citation references for a dataset.

Parameters:: dataset -- The name of a dataset/
Returns:: A string containing citation references of the dataset.

The citation references consist of a description of the dataset with relavent URL references to papers or websites.

The dataset parameter can be passed as a named or un-named parameter or as a dict containing a dataset option.

Example:

import hf_hydrodata as hf

citations = hf.get_citations("NLDAS2")
print(citations)

citations = hf.get_citations(dataset = "NLDAS2")
print(citations)

options = {"dataset": "NLDAS2", "temporal_resolution": "daily"}
citations = hf.get_citations(options)

hf_hydrodata.data_catalog.get_datasets(*args, **kwargs) → List[str]

Get available datasets.

The parameters to the function can be specified either by passing a dict with the parameter values or by passing named parameters to the function.

Parameters:

dataset -- A dataset name (see Gridded Data documentation).
variable -- A variable from a dataset.
temporal_resolution -- The temporal_resolution (e.g. hourly, daily, weekly, monthly) of a dataset variable.
grid -- A grid supported by a dataset (e.g. conus1 or conus2). Normally this is determined by the dataset.
aggregation -- One of mean, max, min. Normally, only needed for temperature variables.

Returns:

A list of dataset names that contain a data catalog entry filtered by the parameters. If no options are provided returns all available datasets.

Examples

import hf_hydrodata as hf

datasets = hf.get_datasets()
assert len(datasets) == 13
assert datasets[0] == "CW3E"

datasets = hf.get_datasets(variable = "air_temp")
assert len(datasets) == 5
assert datasets[0] == "CW3E"

datasets = hf.get_datasets(grid = "conus2")
assert len(datasets) == 5
assert datasets[0] == "CW3E"

options = {"variable": "air_temp", "grid": "conus1"}
datasets = hf.get_datasets(options)
assert len(datasets) == 3
assert datasets[0] == "NLDAS2"

hf_hydrodata.data_catalog.get_table_names() → List[str]

Get the list of table names in the data model.

Returns:: List of of all the table names in the hf_hydrodata data catalog model.

Example:

import hf_hydrodata as hf

names  = hf.get_table_names()

hf_hydrodata.data_catalog.get_table_row(table_name: str, *args, **kwargs) → ModelTableRow

Get one row of a data model table filtered by columns from that table.

Parameters:

table_name -- The name of a table in the data model.
args -- Optional positional parameter that must be a dict with filter options.
kwargs -- Supports multiple named parameters with filter option values.

Returns:

A single of ModelTableRow entries of the specified table_name that match the filter options or None if now row is found.

Raises:

ValueError -- If the filter options are ambiguous and this matches more than one row.

Example:

import hf_hydrodata as hf

row = hf.get_table_row("variable", variable_type="atmospheric", unit_type="pressure")
assert row["id"] == "atmospheric_pressure"

hf_hydrodata.data_catalog.get_table_rows(table_name: str, *args, **kwargs) → List[ModelTableRow]

Get rows of a data model table filtered by columns from that table.

Parameters:

table_name -- The name of a table in the data model.
args -- Optional positional parameter that must be a dict with filter options.
kwargs -- Supports multiple named parameters with filter option values.

Returns:

A list of ModelTableRow entries of the specified table_name that match the filter options.

Example:

import hf_hydrodata as hf

rows = hf.get_table_rows("variable", variable_type="atmospheric")
assert len(rows) == 8
assert rows[0]["id"] == "air_temp"

hf_hydrodata.data_catalog.get_variables(*args, **kwargs) → List[str]

Get available variables.

The parameters to the function can be specified either by passing a dict with the parameter values or by passing named parameters to the function.

Parameters:

dataset -- A dataset name (see Gridded Data documentation).
variable -- A variable from a dataset.
temporal_resolution -- The temporal_resolution (e.g. hourly, daily, weekly, monthly) of a dataset variable.
grid -- A grid supported by a dataset (e.g. conus1 or conus2). Normally this is determined by the dataset.
aggregation -- One of mean, max, min. Normally, only needed for temperature variables.

Returns:

A list of variable names that contain a data catalog entry filtered by the parameters. If no options are provided returns all available variables.

Examples

import hf_hydrodata as hf

variables = hf.get_variables()
assert len(variables) == 63
assert variables[0] == "air_temp"

variables = hf.get_variables(dataset = "CW3E")
assert len(variables) == 8
assert variables[0] == "air_temp"

variables = hf.get_variables(grid = "conus2")
assert len(variables) == 30
assert variables[0] == "air_temp"

options = {"dataset": "NLDAS2", "grid": "conus1"}
variables = hf.get_variables(options)
assert len(variables) == 8
assert variables[0] == "air_temp"

hf_hydrodata.data_catalog.register_api_pin(email: str, pin: str)

Register the email and pin that was created with the website in the users home directory.

Parameters:

email -- Email address used to create an API pin.
pin -- The 4 digit pin registered to be able to use the API.

This only needs to be execute once per machine to register the pin. You can signup for an account using https://hydrogen.princeton.edu/signup. You can create a pin using the URL https://hydrogen.princeton.edu/pin.

Example:

import hf_hydrodata as hf

hf.register_api_pin("dummy@gmail.com", "1234")