API Reference#
Classes#
Query#
Note
Use the initialise()
class method to create an instance of the Query
object, as this method assembles metadata relevant to NEMSEER cache searching.
- class nemseer.query.Query(run_start: str, run_end: str, forecasted_start: str, forecasted_end: str, forecast_type: str, tables: Union[str, List[str]], metadata: Dict[str, str], raw_cache, processed_cache=None, processed_queries: Optional[Union[Dict[str, Path], Dict]] = None)[source]#
Query
validates user inputs and dispatches data downloaders and compilersConstruct
Query
using theQuery.initialise()
constructor. This ensures query metadata is constructed approriately.Query:
- Validates user input data
Checks datetimes fit
yyyy/mm/dd HH:MM
formatChecks datetime chronology (e.g. end is after start)
Checks requested datetimes are valid for each forecast type
Validates forecast type
Validates user-requested tables against what is available on NEMWeb
Retains query metadata (via constructor class method
nemseer.query.Query.initialise()
)Can check
raw_cache
andprocessed_cache
contents to streamline query compilation
- Parameters:
- run_start#
Forecast runs at or after this datetime are queried.
- Type:
- run_end#
Forecast runs before or at this datetime are queried.
- Type:
- forecasted_start#
Forecasts pertaining to times at or after this datetime are retained.
- Type:
- forecasted_end#
Forecasts pertaining to times before or at this datetime are retaned.
- Type:
- forecast_type#
One of
nemseer.forecast_types
.- Type:
- tables#
Table or tables required. A single table can be supplied as a string. Multiple tables can be supplied as a list of strings.
- Type:
List[str]
- metadata#
Metadata dictionary. Constructed by
Query.initialise()
.
- processed_cache#
Path to build or reuse processed_cache. Should be distinct from
raw_cache
- Type:
optional
- processed_queries#
Defaults to None on initialisation. Populated once
Query.find_table_queries_in_processed_cache()
is called.- Type:
Union[Dict[str, pathlib.Path], Dict]
- check_all_raw_data_in_cache() bool [source]#
Checks whether all requested data is already in the
raw_cache
as parquetnemseer.downloader.ForecastTypeDownloader.download_csv()
handles partialraw_cache
completenessIf all requested data is already in the
raw_cache
as parquet, returns True. Otherwise returns False.- Return type:
- find_table_queries_in_processed_cache(data_format: str) None [source]#
Determines which tables already have queries saved in the
processed_cache
.If data_format=df, this function will sieve through the metadata of all parquet files in the
processed_cache
. Note that parquet metadata is UTF-8 encoded. Similarly, data_format=xr will check the metadata of all netCDF files.Modifies
Query.processed_queries
fromNone
to adict
.The
dict
is empty if:processed_cache
isNone
No portion of the query has been saved in the
processed_cache
If a portion of the queries are saved in the
processed_cache
, thenQuery.processed_queries
will be equal to adict
that maps the saved query’s table name to the saved query’s filename.- Parameters:
data_format (str) – As per
nemseer.compile_data()
- Return type:
None
Downloader#
Note
Use the from_Query()
class method to create an instance of the ForecastTypeLoader
object.
- class nemseer.downloader.ForecastTypeDownloader(*, run_start: datetime, run_end: datetime, forecast_type: str, tables: List[str], raw_cache: Path)[source]#
ForecastTypeDownloader
can initiate csv downloads and convert raw_cache csvs to the parquet format.- Parameters:
- run_start#
Forecast runs at or after this datetime are queried.
- Type:
- run_end#
Forecast runs before or at this datetime are queried.
- Type:
- forecast_type#
One of
nemseer.forecast_types
- Type:
- tables#
Table or tables required. A single table can be supplied as a string. Multiple tables can be supplied as a list of strings.
- Type:
List[str]
- convert_to_parquet(keep_csv=False) None [source]#
Converts all CSVs in the
raw_cache
to parquetWarning
A warning is printed if the filesize is greater than half of available memory as
pandas.DataFrame
consumes more than the file size in memory.- Return type:
None
- download_csv() None [source]#
Downloads and unzips zip files given query loaded into
ForecastTypeDownloader
This method will only download and unzip the relevant zip/csv if the corresponding .parquet file is not located in the specified
raw_cache
.- Return type:
None
- classmethod from_Query(query: Query) ForecastTypeDownloader [source]#
Constructor method for
ForecastTypeDownloader
fromQuery
- Parameters:
query (Query) –
- Return type:
DataCompiler#
Note
Use the from_Query()
class method to create an instance of the DataCompiler
object.
- class nemseer.data_compilers.DataCompiler(run_start: datetime, run_end: datetime, forecasted_start: datetime, forecasted_end: datetime, forecast_type: str, metadata: Dict[str, str], raw_cache: Path, processed_cache: Union[None, Path], processed_queries: Union[Dict[str, Path], Dict], raw_tables: List[str], compiled_data: Union[None, Dict[str, DataFrame], Dict[str, Dataset]] = None)[source]#
DataCompiler
compiles data from the raw_cache or processed_cache.- Parameters:
- run_start#
Forecast runs at or after this datetime are queried.
- Type:
- run_end#
Forecast runs before or at this datetime are queried.
- Type:
- forecasted_start#
Forecasts pertaining to times at or after this datetime are retained.
- Type:
- forecasted_end#
Forecasts pertaining to times before or at this datetime are retaned.
- Type:
- forecast_type#
One of
nemseer.forecast_types
.- Type:
- tables#
Table or tables required. A single table can be supplied as a string. Multiple tables can be supplied as a list of strings.
- metadata#
Metadata dictionary. Constructed by
Query.initialise()
.
- processed_cache#
Path to build or reuse :term`processed cache`. Should be distinct from
raw_cache
- Type:
optional
- processed_queries#
Defaults to
None
on initialisation.- Type:
Union[Dict[str, pathlib.Path], Dict]
- raw_table#
Populated via
DataCompiler.from_Query()
- compiled_data#
Defaults to
None
on initialisation. Populated once data is compiled by methods.- Type:
Union[None, Dict[str, pandas.core.frame.DataFrame], Dict[str, xarray.core.dataset.Dataset]]
- compile_processed_data(data_format: str = 'df') None [source]#
Compiles data from the
processed_cache
, as per entries inprocessed_queries
, to apandas.DataFrame
(default) or to axarray.Dataset
.This method will update
compiled_data
.- Parameters:
data_format (str) – Default “df” (
pandas.DataFrame
). Other valid input is “xr”, which compilesxarray.Dataset
.- Return type:
None
- compile_raw_data(data_format: str = 'df') None [source]#
Compiles data from
raw_cache
to apandas.DataFrame
(default) or to axarray.Dataset
.This compiler will:
Skip invalid/corrupted files as recorded in .invalid_aemo_files.txt
Read raw_cache parquet files and apply datetime filtering
Convert
DataFrame
toxarray.Dataset
(ifdata_format
= “xr”)Update
compiled_data
- Parameters:
data_format (str) – Default “df” (
pandas.DataFrame
). Other valid input is “xr”, which returnsxarray.Dataset
.- Return type:
None
Warning
Skips any files previously found to be invalid/corrupted and prints a warning
- classmethod from_Query(query: Query) DataCompiler [source]#
Constructor method for
DataCompiler
fromQuery
.- Parameters:
query (Query) –
- Return type:
- invalid_or_corrupted_files() List[str] [source]#
A list of invalid/corrupted files as per files in .invalid_aemo_files.txt. Returns an empty list if the stubfile does not exist.
- write_to_processed_cache() None [source]#
Writes netCDF4 for
xarray.Dataset
and parquet forpandas.DataFrame
to theprocessed_cache
with associated query metadata.Note that parquet metadata needs to be UTF-8 encoded.
- Raises:
ValueError – If
processed_cache
isNone
, or ifcompiled_data
contains data that is neither allpandas.DataFrame
or allxarray.Dataset
IOError – If
compiled_data
isNone
- Return type:
None
Functions#
Query handlers#
- nemseer.query.generate_sqlloader_filenames(run_start: datetime, run_end: datetime, forecast_type: str, tables: List[str]) Dict[Tuple[int, int, str], str] [source]#
Generates MMSDM Historical Data SQLLoader file names based on provided query data
Returns a tuple of query metadata (table, year, month) mapped to each filename
- Parameters:
- Returns:
A tuple of query metadata (table, year, month) mapped to each format-agnostic (SQLLoader) filename
- Return type:
Scrapers and downloaders#
Scrapers: These functions scrape the MMSDM Historical Data SQLLoader repository to assist nemseer
in validating inputs and providing feedback to users.
Downloaders: Used to download and unzip a .zip
file.
- nemseer.downloader.get_sqlloader_forecast_tables(year: int, month: int, forecast_type: str, actual: bool = False) List[str] [source]#
Requestable tables of particular forecast type on MMSDM Historical Data SQLLoader
If
actual
= False, provides a list of tables that can be requested via nemseer.If
actual
= True, returns actual tables available via NEMWeb, including all tables that are enumerated.- N.B.:
Removes numbering from enumerated tables for P5MIN - e.g. CONSTRAINTSOLUTION(x) are all reduced to CONSTRAINTSOLUTION
Examples
- nemseer.downloader.get_sqlloader_years_and_months() Dict[int, List[int]] [source]#
Years and months with data on NEMWeb MMSDM Historical Data SQLLoader .. rubric:: Examples
- nemseer.downloader.get_unzipped_csv(url: str, raw_cache: Path) None [source]#
Unzipped (single) csv file downloaded from url to raw_cache
This function:
Downloads zip file in chunks to limit memory use and enable progress bar
Validates that the zip contains a single file that has the same name as the zip
If the zip file is invalid, writes the file stub to .invalid_aemo_files.txt
Data handlers#
Functions for handling various data states.
Valid inputs for clean_forecast_csv
are the same as those for pandas.read_csv
.
- nemseer.data_handlers.apply_run_and_forecasted_time_filters(df: DataFrame, run_start: datetime, run_end: datetime, forecasted_start: datetime, forecasted_end: datetime, forecast_type: str) DataFrame [source]#
Applies filtering for run times (i.e. run_start and run_end) and forecasted times (i.e. forecasted_start and forecasted_end).
Datetime filtering is applied to a column fetched from lookup tables that map the relevant column name to each forecast type. If the run time/forecasted column obtained from the lookup is not present in the DataFrame, the respective filter is not applied.
- Parameters:
run_start (datetime) – Forecast runs at or after this datetime are queried.
run_end (datetime) – Forecast runs before or at this datetime are queried.
forecasted_start (datetime) – Forecasts pertaining to times at or after this datetime are retained.
forecasted_end (datetime) – Forecasts pertaining to times before or at this datetime are retaned.
forecast_type (str) – One of
nemseer.forecast_types
.df (DataFrame) –
- Returns:
DataFrame with appropriate datetime filtering applied.
- Return type:
- nemseer.data_handlers.clean_forecast_csv(filepath_or_buffer: Union[str, Path]) DataFrame [source]#
Given a forecast csv filepath or buffer, reads and cleans the forecast csv.
Cleans artefacts in the forecast csv files, including AEMO metadata at start of file and end of report line. Also removes any duplicate rows.
- Parameters:
filepath_or_buffer (Union[str, Path]) – As for
pandas.read_csv()
- Returns:
Cleaned
pandas.DataFrame
with forecast data- Return type:
Warning
Removes duplicate rows. Raises a warning when doing so.
- nemseer.data_handlers.to_xarray(df: DataFrame, forecast_type: str)[source]#
Converts a
pandas.DataFrame
to axarray.Dataset
using nemseer definitions to determine Dataset dimensions.The conversion is processed in chunks. If system memory usage exceeds 95%, the conversion is terminated with a MemoryError. This is more informative than the system killing the Python process.
- Parameters:
df (DataFrame) – pandas.DataFrame to be converted.
forecast_type (str) – One of
nemseer.forecast_types
.
- Returns:
Warning
Raises a warning when attempting to convert high-dimensional data.
Forecast-specific helpers#
Datetime validators#
These validators are specific to each forecast type. They are used prior to initiating data compilation, and check that user-supplied datetime inputs are valid for the relevant forecast type.
- nemseer.forecast_type.validators.validate_MTPASA_datetime_inputs(run_start: datetime, run_end: datetime, forecasted_start: datetime, forecasted_end: datetime) None [source]#
From AEMO PASA Outputs:
[MT PASA] is produced weekly (on Tuesdays) and lists the medium-term supply/demand prospects for the period two years in advance. The information is provided for each day within the report period.
Noting that:
MT PASA is actually run at half-hourly resolution - But results are aggregated and reported for each day
Timing of “RUN_DATETIME” appears to be inconsistent on inspection - No validation on
run_start
andrun_end
- Compiler will instead collect all forecasts between provided forecast datetimes
Validation checks:
- Check 1:
forecasted_start and forecasted_end are at 00:00 for each supplied date. This is because results are reported for a day.
- Check 2:
forecasted_end is within 2 years and 16 days of run_end. A 16 day offset appears to be in older data.Newer data appears to have a 6 day offset.
Todo
Handle MTPASA DUID Availability
- Parameters:
run_start (datetime) – Forecast runs at or after this datetime are queried.
run_end (datetime) – Forecast runs before or at this datetime are queried.
forecasted_start (datetime) – Forecasts pertaining to times at or after this datetime are retained.
forecasted_end (datetime) – Forecasts pertaining to times before or at this datetime are retaned.
- Raises:
ValueError – If any validation conditions are failed.
- Return type:
None
- nemseer.forecast_type.validators.validate_P5MIN_datetime_inputs(run_start: datetime, run_end: datetime, forecasted_start: datetime, forecasted_end: datetime) None [source]#
Validates P5MIN forecast datetime inputs
From AEMO MMS Data Model Report:
The 5-minute Predispatch cycle runs every 5-minutes to produce a dispatch and pricing schedule to a 5-minute resolution covering the next hour, a total of twelve periods.
Validation checks:
- Check 1:
Minute component of datetime inputs is on a 5 minute basis
- Check 2:
forecasted_end
is not more than 55 minutes (12 cycles) fromrun_end
These 12 dispatch cycles include the immediate interval (i.e. where RUN_DATETIME = INTERVAL_DATETIME)
- Parameters:
run_start (datetime) – Forecast runs at or after this datetime are queried.
run_end (datetime) – Forecast runs before or at this datetime are queried.
forecasted_start (datetime) – Forecasts pertaining to times at or after this datetime are retained.
forecasted_end (datetime) – Forecasts pertaining to times before or at this datetime are retaned.
- Raises:
ValueError – If any validation conditions are failed.
- Return type:
None
- nemseer.forecast_type.validators.validate_PDPASA_datetime_inputs(run_start: datetime, run_end: datetime, forecasted_start: datetime, forecasted_end: datetime) None [source]#
Validates PDPASA forecast datetime inputs
Points to
validate_PREDISPATCH_datetime_inputs()
as validation for PREDISPATCH and PDPASA are the same.- Parameters:
run_start (datetime) – Forecast runs at or after this datetime are queried.
run_end (datetime) – Forecast runs before or at this datetime are queried.
forecasted_start (datetime) – Forecasts pertaining to times at or after this datetime are retained.
forecasted_end (datetime) – Forecasts pertaining to times before or at this datetime are retaned.
- Raises:
ValueError – If any validation conditions are failed.
- Return type:
None
- nemseer.forecast_type.validators.validate_PREDISPATCH_datetime_inputs(run_start: datetime, run_end: datetime, forecasted_start: datetime, forecasted_end: datetime) None [source]#
Validates PREDISPATCH forecast datetime inputs
From AEMO Pre-dispatch SOP:
Currently AEMO runs pre-dispatch every half hour, on the half hour for each 30-minute period up to and including the last 30-minute period of the last trading day for which bid band prices have closed. As changes to bid band prices for the next trading day close at 1230 hours EST, AEMO will at 1230 hours, publish pre-dispatch for all 30-minute periods up to the end of the next trading day.
Noting that:
A market/trading day extends from 0400 to 0400 on the next day.
Pre-dispatch executed at 1230 hours is associated with the 1300 hours run time. That is, PREDISPATCHSEQ corresponding to 13:00 contains bids for the next trading day.
Validation checks:
- Check 1:
Minute component of datetime inputs is on a 30 minute basis
- Check 2:
forecasted_end
is no later than the end of the last trading day for which bid band prices have closed (the end of that day being 04:00) byrun_end
- Parameters:
run_start (datetime) – Forecast runs at or after this datetime are queried.
run_end (datetime) – Forecast runs before or at this datetime are queried.
forecasted_start (datetime) – Forecasts pertaining to times at or after this datetime are retained.
forecasted_end (datetime) – Forecasts pertaining to times before or at this datetime are retaned.
- Raises:
ValueError – If any validation conditions are failed.
- Return type:
None
- nemseer.forecast_type.validators.validate_STPASA_datetime_inputs(run_start: datetime, run_end: datetime, forecasted_start: datetime, forecasted_end: datetime) None [source]#
Validates STPASA forecast datetime inputs
From AEMO PASA Outputs:
[ST PASA] is published every 2 hours and provides detailed disclosure of short-term is published every 2 hours and provides detailed disclosure of short-term power-system supply/demand balance prospects for six days following the next trading day. The information is provided for each half-hour within the report period
Noting that:
A market/trading day extends from 0400 to 0400 on the next day.
ST PASA is the “reverse” of PREDISPATCH - ST PASA starts after the end of the next trading day for which bids have been submitted
The National Electricity Rules and some of AEMO’s procedures state that ST PASA is run every two hours. The frequency was increased to hourly. See Spot Market Operations Timetable.
Validation checks:
- Check 1:
Minute component of forecast datetimes is on an hourly basis (i.e. 0 minutes)
- Check 2:
Minute component of forecasted datetimes is on a 30 minute basis
- Check 3:
forecasted_start
is not equal to or earlier than the end of the last trading day for which bid band prices have closed (the end of that day being 04:00) byrun_start
- Check 4:
forecasted_end
is no later than 6 days from the end of the last trading day for which bid band prices have closed byrun_end
- Parameters:
run_start (datetime) – Forecast runs at or after this datetime are queried.
run_end (datetime) – Forecast runs before or at this datetime are queried.
forecasted_start (datetime) – Forecasts pertaining to times at or after this datetime are retained.
forecasted_end (datetime) – Forecasts pertaining to times before or at this datetime are retaned.
- Raises:
ValueError – If any validation conditions are failed.
- Return type:
None
Run time generators#
Run time generators produce the widest valid run time range for a particular forecast type given forecasted_start and forecasted_end.
- nemseer.forecast_type.run_time_generators._generate_MTPASA_runtimes(forecasted_start: datetime, forecasted_end: datetime) Tuple[datetime, datetime] [source]#
Generates the earliest run_start and latest run_end for a set of user-supplied forecasted_start and forecasted_end times.
Calls validation function to ensure that user-supplied forecasted times are valid.
- Parameters:
- Returns:
Tuple of datetimes containing the widest range of possible forecasted times
- Return type:
- nemseer.forecast_type.run_time_generators._generate_P5MIN_runtimes(forecasted_start: datetime, forecasted_end: datetime) Tuple[datetime, datetime] [source]#
Generates the earliest run_start and latest run_end for a set of user-supplied forecasted_start and forecasted_end times.
Calls validation function to ensure that user-supplied forecasted times are valid.
- Parameters:
- Returns:
Tuple of datetimes containing the widest range of possible forecasted times
- Return type:
- nemseer.forecast_type.run_time_generators._generate_PDPASA_runtimes(forecasted_start: datetime, forecasted_end: datetime) Tuple[datetime, datetime] [source]#
Generates the earliest run_start and latest run_end for a set of user-supplied forecasted_start and forecasted_end times.
Calls validation function to ensure that user-supplied forecasted times are valid.
- Parameters:
- Returns:
Tuple of datetimes containing the widest range of possible forecasted times
- Return type:
- nemseer.forecast_type.run_time_generators._generate_PREDISPATCH_runtimes(forecasted_start: datetime, forecasted_end: datetime) Tuple[datetime, datetime] [source]#
Generates the earliest run_start and latest run_end for a set of user-supplied forecasted_start and forecasted_end times.
Calls validation function to ensure that user-supplied forecasted times are valid.
- Parameters:
- Returns:
Tuple of datetimes containing the widest range of possible forecasted times
- Return type:
- nemseer.forecast_type.run_time_generators._generate_STPASA_runtimes(forecasted_start: datetime, forecasted_end: datetime) Tuple[datetime, datetime] [source]#
Generates the earliest run_start and latest run_end for a set of user-supplied forecasted_start and forecasted_end times.
Calls validation function to ensure that user-supplied forecasted times are valid.
- Parameters:
- Returns:
Tuple of datetimes containing the widest range of possible forecasted times
- Return type:
Data#
- nemseer.data.DATETIME_FORMAT = '%Y/%m/%d %H:%M'#
nemseer date format
- nemseer.data.DEPRECATED_TABLES = {'MTPASA': ['CASESOLUTION']}#
Deprecated tables
- nemseer.data.ENUMERATED_TABLES = {'P5MIN': [('CONSTRAINTSOLUTION', 4)], 'PREDISPATCH': [('CONSTRAINT', 2), ('LOAD', 2)]}#
Enumerated tables for each forecast type First element of tuple is table name Second element of tuple is number which to enumerate table to
- nemseer.data.FORECASTED_COL = {'MTPASA': 'DAY', 'P5MIN': 'INTERVAL_DATETIME', 'PDPASA': 'INTERVAL_DATETIME', 'PREDISPATCH': 'DATETIME', 'STPASA': 'INTERVAL_DATETIME'}#
If it exists, nemseer uses the corresponding column for forecasted time filtering.
- nemseer.data.FORECAST_TYPES = ('P5MIN', 'PREDISPATCH', 'PDPASA', 'STPASA', 'MTPASA')#
Forecast types requestable through nemseer. See also forecast types, and pre-dispatch and PASA.
- nemseer.data.INVALID_STUBS_FILE = '.invalid_aemo_files.txt'#
File in raw_cache that contains invalid/corrupted AEMO files
- nemseer.data.MMSDM_ARCHIVE_URL = 'http://www.nemweb.com.au/Data_Archive/Wholesale_Electricity/MMSDM/'#
Wholesale electricity data archive base URL
- nemseer.data.MTPASA_DUID_URL = 'http://nemweb.com.au/Reports/Current/MTPASA_DUIDAvailability/'#
MTPASA DUID Availability
- nemseer.data.PREDISP_ALL_DATA = ('CONSTRAINT', 'INTERCONNECTORRES', 'PRICE', 'LOAD', 'REGIONSUM')#
Tables which should be directed to the PREDISP_ALL_DATA URL. The corresponding tables in the DATA folder (which end with “_D”) only contain the latest forecasted value
- nemseer.data.RUNTIME_COL = {'MTPASA': 'RUN_DATETIME', 'P5MIN': 'RUN_DATETIME', 'PDPASA': 'RUN_DATETIME', 'PREDISPATCH': 'PREDISPATCH_RUN_DATETIME', 'STPASA': 'RUN_DATETIME'}#
If it exists, nemseer will use the corresponding column for run time filtering.