API Reference#

Classes#

Query#

Note

Use the initialise() class method to create an instance of the Query object, as this method assembles metadata relevant to NEMSEER cache searching.

class nemseer.query.Query(run_start: str, run_end: str, forecasted_start: str, forecasted_end: str, forecast_type: str, tables: Union[str, List[str]], metadata: Dict[str, str], raw_cache, processed_cache=None, processed_queries: Optional[Union[Dict[str, Path], Dict]] = None)[source]#

Query validates user inputs and dispatches data downloaders and compilers

Construct Query using the Query.initialise() constructor. This ensures query metadata is constructed approriately.

Query:

  • Validates user input data
    • Checks datetimes fit yyyy/mm/dd HH:MM format

    • Checks datetime chronology (e.g. end is after start)

    • Checks requested datetimes are valid for each forecast type

    • Validates forecast type

    • Validates user-requested tables against what is available on NEMWeb

  • Retains query metadata (via constructor class method nemseer.query.Query.initialise())

  • Can check raw_cache and processed_cache contents to streamline query compilation

Parameters:
run_start#

Forecast runs at or after this datetime are queried.

Type:

datetime.datetime

run_end#

Forecast runs before or at this datetime are queried.

Type:

datetime.datetime

forecasted_start#

Forecasts pertaining to times at or after this datetime are retained.

Type:

datetime.datetime

forecasted_end#

Forecasts pertaining to times before or at this datetime are retaned.

Type:

datetime.datetime

forecast_type#

One of nemseer.forecast_types.

Type:

str

tables#

Table or tables required. A single table can be supplied as a string. Multiple tables can be supplied as a list of strings.

Type:

List[str]

metadata#

Metadata dictionary. Constructed by Query.initialise().

Type:

Dict[str, str]

raw_cache#

Path to build or reuse raw_cache.

Type:

pathlib.Path

processed_cache#

Path to build or reuse processed_cache. Should be distinct from raw_cache

Type:

optional

processed_queries#

Defaults to None on initialisation. Populated once Query.find_table_queries_in_processed_cache() is called.

Type:

Union[Dict[str, pathlib.Path], Dict]

check_all_raw_data_in_cache() bool[source]#

Checks whether all requested data is already in the raw_cache as parquet

nemseer.downloader.ForecastTypeDownloader.download_csv() handles partial raw_cache completeness

If all requested data is already in the raw_cache as parquet, returns True. Otherwise returns False.

Return type:

bool

find_table_queries_in_processed_cache(data_format: str) None[source]#

Determines which tables already have queries saved in the processed_cache.

If data_format=df, this function will sieve through the metadata of all parquet files in the processed_cache. Note that parquet metadata is UTF-8 encoded. Similarly, data_format=xr will check the metadata of all netCDF files.

Modifies Query.processed_queries from None to a dict.

The dict is empty if:

  1. processed_cache is None

  2. No portion of the query has been saved in the processed_cache

If a portion of the queries are saved in the processed_cache, then Query.processed_queries will be equal to a dict that maps the saved query’s table name to the saved query’s filename.

Parameters:

data_format (str) – As per nemseer.compile_data()

Return type:

None

classmethod initialise(run_start: str, run_end: str, forecasted_start: str, forecasted_end: str, forecast_type: str, tables: Union[str, List[str]], raw_cache: str, processed_cache: Optional[str] = None) Query[source]#

Constructor method for Query. Assembles query metatdata.

Parameters:
Return type:

Query

Downloader#

Note

Use the from_Query() class method to create an instance of the ForecastTypeLoader object.

class nemseer.downloader.ForecastTypeDownloader(*, run_start: datetime, run_end: datetime, forecast_type: str, tables: List[str], raw_cache: Path)[source]#

ForecastTypeDownloader can initiate csv downloads and convert raw_cache csvs to the parquet format.

Parameters:
run_start#

Forecast runs at or after this datetime are queried.

Type:

datetime.datetime

run_end#

Forecast runs before or at this datetime are queried.

Type:

datetime.datetime

forecast_type#

One of nemseer.forecast_types

Type:

str

tables#

Table or tables required. A single table can be supplied as a string. Multiple tables can be supplied as a list of strings.

Type:

List[str]

raw_cache#

Path to download raw data to. Can reuse or build a new raw_cache.

Type:

pathlib.Path

convert_to_parquet(keep_csv=False) None[source]#

Converts all CSVs in the raw_cache to parquet

Warning

A warning is printed if the filesize is greater than half of available memory as pandas.DataFrame consumes more than the file size in memory.

Return type:

None

download_csv() None[source]#

Downloads and unzips zip files given query loaded into ForecastTypeDownloader

This method will only download and unzip the relevant zip/csv if the corresponding .parquet file is not located in the specified raw_cache.

Return type:

None

classmethod from_Query(query: Query) ForecastTypeDownloader[source]#

Constructor method for ForecastTypeDownloader from Query

Parameters:

query (Query) –

Return type:

ForecastTypeDownloader

DataCompiler#

Note

Use the from_Query() class method to create an instance of the DataCompiler object.

class nemseer.data_compilers.DataCompiler(run_start: datetime, run_end: datetime, forecasted_start: datetime, forecasted_end: datetime, forecast_type: str, metadata: Dict[str, str], raw_cache: Path, processed_cache: Union[None, Path], processed_queries: Union[Dict[str, Path], Dict], raw_tables: List[str], compiled_data: Union[None, Dict[str, DataFrame], Dict[str, Dataset]] = None)[source]#

DataCompiler compiles data from the raw_cache or processed_cache.

Parameters:
run_start#

Forecast runs at or after this datetime are queried.

Type:

datetime.datetime

run_end#

Forecast runs before or at this datetime are queried.

Type:

datetime.datetime

forecasted_start#

Forecasts pertaining to times at or after this datetime are retained.

Type:

datetime.datetime

forecasted_end#

Forecasts pertaining to times before or at this datetime are retaned.

Type:

datetime.datetime

forecast_type#

One of nemseer.forecast_types.

Type:

str

tables#

Table or tables required. A single table can be supplied as a string. Multiple tables can be supplied as a list of strings.

metadata#

Metadata dictionary. Constructed by Query.initialise().

Type:

Dict[str, str]

raw_cache#

Path to build or reuse raw_cache.

Type:

optional

processed_cache#

Path to build or reuse :term`processed cache`. Should be distinct from raw_cache

Type:

optional

processed_queries#

Defaults to None on initialisation.

Type:

Union[Dict[str, pathlib.Path], Dict]

raw_table#

Populated via DataCompiler.from_Query()

compiled_data#

Defaults to None on initialisation. Populated once data is compiled by methods.

Type:

Union[None, Dict[str, pandas.core.frame.DataFrame], Dict[str, xarray.core.dataset.Dataset]]

compile_processed_data(data_format: str = 'df') None[source]#

Compiles data from the processed_cache, as per entries in processed_queries, to a pandas.DataFrame (default) or to a xarray.Dataset.

This method will update compiled_data.

Parameters:

data_format (str) – Default “df” (pandas.DataFrame). Other valid input is “xr”, which compiles xarray.Dataset.

Return type:

None

compile_raw_data(data_format: str = 'df') None[source]#

Compiles data from raw_cache to a pandas.DataFrame (default) or to a xarray.Dataset.

This compiler will:

Parameters:

data_format (str) – Default “df” (pandas.DataFrame). Other valid input is “xr”, which returns xarray.Dataset.

Return type:

None

Warning

Skips any files previously found to be invalid/corrupted and prints a warning

classmethod from_Query(query: Query) DataCompiler[source]#

Constructor method for DataCompiler from Query.

Parameters:

query (Query) –

Return type:

DataCompiler

invalid_or_corrupted_files() List[str][source]#

A list of invalid/corrupted files as per files in .invalid_aemo_files.txt. Returns an empty list if the stubfile does not exist.

Return type:

List[str]

write_to_processed_cache() None[source]#

Writes netCDF4 for xarray.Dataset and parquet for pandas.DataFrame to the processed_cache with associated query metadata.

Note that parquet metadata needs to be UTF-8 encoded.

Raises:
Return type:

None

Functions#

Query handlers#

nemseer.query.generate_sqlloader_filenames(run_start: datetime, run_end: datetime, forecast_type: str, tables: List[str]) Dict[Tuple[int, int, str], str][source]#

Generates MMSDM Historical Data SQLLoader file names based on provided query data

Returns a tuple of query metadata (table, year, month) mapped to each filename

Parameters:
  • run_start (datetime) – Forecast runs at or after this datetime are queried.

  • run_end (datetime) – Forecast runs before or at this datetime are queried.

  • forecast_type (str) – One of nemseer.forecast_types.

  • tables (List[str]) – Table or tables required, provided as a List.

Returns:

A tuple of query metadata (table, year, month) mapped to each format-agnostic (SQLLoader) filename

Return type:

Dict[Tuple[int, int, str], str]

Scrapers and downloaders#

Scrapers: These functions scrape the MMSDM Historical Data SQLLoader repository to assist nemseer in validating inputs and providing feedback to users.

Downloaders: Used to download and unzip a .zip file.

nemseer.downloader.get_sqlloader_forecast_tables(year: int, month: int, forecast_type: str, actual: bool = False) List[str][source]#

Requestable tables of particular forecast type on MMSDM Historical Data SQLLoader

If actual = False, provides a list of tables that can be requested via nemseer.

If actual = True, returns actual tables available via NEMWeb, including all tables that are enumerated.

N.B.:
  • Removes numbering from enumerated tables for P5MIN - e.g. CONSTRAINTSOLUTION(x) are all reduced to CONSTRAINTSOLUTION

Examples

See querying table availability

Parameters:
Returns:

List of tables associated with that forecast type for that period

Return type:

List[str]

nemseer.downloader.get_sqlloader_years_and_months() Dict[int, List[int]][source]#

Years and months with data on NEMWeb MMSDM Historical Data SQLLoader .. rubric:: Examples

See querying date ranges

Returns:

Months mapped to each year. Data is available for each of these months.

Return type:

Dict[int, List[int]]

nemseer.downloader.get_unzipped_csv(url: str, raw_cache: Path) None[source]#

Unzipped (single) csv file downloaded from url to raw_cache

This function:

  1. Downloads zip file in chunks to limit memory use and enable progress bar

  2. Validates that the zip contains a single file that has the same name as the zip

  3. If the zip file is invalid, writes the file stub to .invalid_aemo_files.txt

Parameters:
  • url (str) – URL of zip

  • raw_cache (Path) – Path to save zip. See raw_cache.

Returns:

None. Extracts csvs to raw_cache.

Return type:

None

Data handlers#

Functions for handling various data states.

Valid inputs for clean_forecast_csv are the same as those for pandas.read_csv.

nemseer.data_handlers.apply_run_and_forecasted_time_filters(df: DataFrame, run_start: datetime, run_end: datetime, forecasted_start: datetime, forecasted_end: datetime, forecast_type: str) DataFrame[source]#

Applies filtering for run times (i.e. run_start and run_end) and forecasted times (i.e. forecasted_start and forecasted_end).

Datetime filtering is applied to a column fetched from lookup tables that map the relevant column name to each forecast type. If the run time/forecasted column obtained from the lookup is not present in the DataFrame, the respective filter is not applied.

Parameters:
  • run_start (datetime) – Forecast runs at or after this datetime are queried.

  • run_end (datetime) – Forecast runs before or at this datetime are queried.

  • forecasted_start (datetime) – Forecasts pertaining to times at or after this datetime are retained.

  • forecasted_end (datetime) – Forecasts pertaining to times before or at this datetime are retaned.

  • forecast_type (str) – One of nemseer.forecast_types.

  • df (DataFrame) –

Returns:

DataFrame with appropriate datetime filtering applied.

Return type:

DataFrame

nemseer.data_handlers.clean_forecast_csv(filepath_or_buffer: Union[str, Path]) DataFrame[source]#

Given a forecast csv filepath or buffer, reads and cleans the forecast csv.

Cleans artefacts in the forecast csv files, including AEMO metadata at start of file and end of report line. Also removes any duplicate rows.

Parameters:

filepath_or_buffer (Union[str, Path]) – As for pandas.read_csv()

Returns:

Cleaned pandas.DataFrame with forecast data

Return type:

DataFrame

Warning

Removes duplicate rows. Raises a warning when doing so.

nemseer.data_handlers.to_xarray(df: DataFrame, forecast_type: str)[source]#

Converts a pandas.DataFrame to a xarray.Dataset using nemseer definitions to determine Dataset dimensions.

The conversion is processed in chunks. If system memory usage exceeds 95%, the conversion is terminated with a MemoryError. This is more informative than the system killing the Python process.

Parameters:
Returns:

xarray.Dataset.

Warning

Raises a warning when attempting to convert high-dimensional data.

Forecast-specific helpers#

Datetime validators#

These validators are specific to each forecast type. They are used prior to initiating data compilation, and check that user-supplied datetime inputs are valid for the relevant forecast type.

nemseer.forecast_type.validators.validate_MTPASA_datetime_inputs(run_start: datetime, run_end: datetime, forecasted_start: datetime, forecasted_end: datetime) None[source]#

From AEMO PASA Outputs:

[MT PASA] is produced weekly (on Tuesdays) and lists the medium-term supply/demand prospects for the period two years in advance. The information is provided for each day within the report period.

Noting that:

  • MT PASA is actually run at half-hourly resolution - But results are aggregated and reported for each day

  • Timing of “RUN_DATETIME” appears to be inconsistent on inspection - No validation on run_start and run_end - Compiler will instead collect all forecasts between provided forecast datetimes

Validation checks:

Check 1:

forecasted_start and forecasted_end are at 00:00 for each supplied date. This is because results are reported for a day.

Check 2:

forecasted_end is within 2 years and 16 days of run_end. A 16 day offset appears to be in older data.Newer data appears to have a 6 day offset.

Todo

Handle MTPASA DUID Availability

Parameters:
  • run_start (datetime) – Forecast runs at or after this datetime are queried.

  • run_end (datetime) – Forecast runs before or at this datetime are queried.

  • forecasted_start (datetime) – Forecasts pertaining to times at or after this datetime are retained.

  • forecasted_end (datetime) – Forecasts pertaining to times before or at this datetime are retaned.

Raises:

ValueError – If any validation conditions are failed.

Return type:

None

nemseer.forecast_type.validators.validate_P5MIN_datetime_inputs(run_start: datetime, run_end: datetime, forecasted_start: datetime, forecasted_end: datetime) None[source]#

Validates P5MIN forecast datetime inputs

From AEMO MMS Data Model Report:

The 5-minute Predispatch cycle runs every 5-minutes to produce a dispatch and pricing schedule to a 5-minute resolution covering the next hour, a total of twelve periods.

Validation checks:

Check 1:

Minute component of datetime inputs is on a 5 minute basis

Check 2:

forecasted_end is not more than 55 minutes (12 cycles) from run_end

These 12 dispatch cycles include the immediate interval (i.e. where RUN_DATETIME = INTERVAL_DATETIME)

Parameters:
  • run_start (datetime) – Forecast runs at or after this datetime are queried.

  • run_end (datetime) – Forecast runs before or at this datetime are queried.

  • forecasted_start (datetime) – Forecasts pertaining to times at or after this datetime are retained.

  • forecasted_end (datetime) – Forecasts pertaining to times before or at this datetime are retaned.

Raises:

ValueError – If any validation conditions are failed.

Return type:

None

nemseer.forecast_type.validators.validate_PDPASA_datetime_inputs(run_start: datetime, run_end: datetime, forecasted_start: datetime, forecasted_end: datetime) None[source]#

Validates PDPASA forecast datetime inputs

Points to validate_PREDISPATCH_datetime_inputs() as validation for PREDISPATCH and PDPASA are the same.

Parameters:
  • run_start (datetime) – Forecast runs at or after this datetime are queried.

  • run_end (datetime) – Forecast runs before or at this datetime are queried.

  • forecasted_start (datetime) – Forecasts pertaining to times at or after this datetime are retained.

  • forecasted_end (datetime) – Forecasts pertaining to times before or at this datetime are retaned.

Raises:

ValueError – If any validation conditions are failed.

Return type:

None

nemseer.forecast_type.validators.validate_PREDISPATCH_datetime_inputs(run_start: datetime, run_end: datetime, forecasted_start: datetime, forecasted_end: datetime) None[source]#

Validates PREDISPATCH forecast datetime inputs

From AEMO Pre-dispatch SOP:

Currently AEMO runs pre-dispatch every half hour, on the half hour for each 30-minute period up to and including the last 30-minute period of the last trading day for which bid band prices have closed. As changes to bid band prices for the next trading day close at 1230 hours EST, AEMO will at 1230 hours, publish pre-dispatch for all 30-minute periods up to the end of the next trading day.

Noting that:

  • A market/trading day extends from 0400 to 0400 on the next day.

  • Pre-dispatch executed at 1230 hours is associated with the 1300 hours run time. That is, PREDISPATCHSEQ corresponding to 13:00 contains bids for the next trading day.

Validation checks:

Check 1:

Minute component of datetime inputs is on a 30 minute basis

Check 2:

forecasted_end is no later than the end of the last trading day for which bid band prices have closed (the end of that day being 04:00) by run_end

Parameters:
  • run_start (datetime) – Forecast runs at or after this datetime are queried.

  • run_end (datetime) – Forecast runs before or at this datetime are queried.

  • forecasted_start (datetime) – Forecasts pertaining to times at or after this datetime are retained.

  • forecasted_end (datetime) – Forecasts pertaining to times before or at this datetime are retaned.

Raises:

ValueError – If any validation conditions are failed.

Return type:

None

nemseer.forecast_type.validators.validate_STPASA_datetime_inputs(run_start: datetime, run_end: datetime, forecasted_start: datetime, forecasted_end: datetime) None[source]#

Validates STPASA forecast datetime inputs

From AEMO PASA Outputs:

[ST PASA] is published every 2 hours and provides detailed disclosure of short-term is published every 2 hours and provides detailed disclosure of short-term power-system supply/demand balance prospects for six days following the next trading day. The information is provided for each half-hour within the report period

Noting that:

  • A market/trading day extends from 0400 to 0400 on the next day.

  • ST PASA is the “reverse” of PREDISPATCH - ST PASA starts after the end of the next trading day for which bids have been submitted

The National Electricity Rules and some of AEMO’s procedures state that ST PASA is run every two hours. The frequency was increased to hourly. See Spot Market Operations Timetable.

Validation checks:

Check 1:

Minute component of forecast datetimes is on an hourly basis (i.e. 0 minutes)

Check 2:

Minute component of forecasted datetimes is on a 30 minute basis

Check 3:

forecasted_start is not equal to or earlier than the end of the last trading day for which bid band prices have closed (the end of that day being 04:00) by run_start

Check 4:

forecasted_end is no later than 6 days from the end of the last trading day for which bid band prices have closed by run_end

Parameters:
  • run_start (datetime) – Forecast runs at or after this datetime are queried.

  • run_end (datetime) – Forecast runs before or at this datetime are queried.

  • forecasted_start (datetime) – Forecasts pertaining to times at or after this datetime are retained.

  • forecasted_end (datetime) – Forecasts pertaining to times before or at this datetime are retaned.

Raises:

ValueError – If any validation conditions are failed.

Return type:

None

Run time generators#

Run time generators produce the widest valid run time range for a particular forecast type given forecasted_start and forecasted_end.

nemseer.forecast_type.run_time_generators._generate_MTPASA_runtimes(forecasted_start: datetime, forecasted_end: datetime) Tuple[datetime, datetime][source]#

Generates the earliest run_start and latest run_end for a set of user-supplied forecasted_start and forecasted_end times.

Calls validation function to ensure that user-supplied forecasted times are valid.

Parameters:
  • forecasted_start (datetime) – Forecasts pertaining to times at or after this datetime are retained.

  • forecasted_end (datetime) – Forecasts pertaining to times before or at this datetime are retaned.

Returns:

Tuple of datetimes containing the widest range of possible forecasted times

Return type:

Tuple[datetime, datetime]

nemseer.forecast_type.run_time_generators._generate_P5MIN_runtimes(forecasted_start: datetime, forecasted_end: datetime) Tuple[datetime, datetime][source]#

Generates the earliest run_start and latest run_end for a set of user-supplied forecasted_start and forecasted_end times.

Calls validation function to ensure that user-supplied forecasted times are valid.

Parameters:
  • forecasted_start (datetime) – Forecasts pertaining to times at or after this datetime are retained.

  • forecasted_end (datetime) – Forecasts pertaining to times before or at this datetime are retaned.

Returns:

Tuple of datetimes containing the widest range of possible forecasted times

Return type:

Tuple[datetime, datetime]

nemseer.forecast_type.run_time_generators._generate_PDPASA_runtimes(forecasted_start: datetime, forecasted_end: datetime) Tuple[datetime, datetime][source]#

Generates the earliest run_start and latest run_end for a set of user-supplied forecasted_start and forecasted_end times.

Calls validation function to ensure that user-supplied forecasted times are valid.

Parameters:
  • forecasted_start (datetime) – Forecasts pertaining to times at or after this datetime are retained.

  • forecasted_end (datetime) – Forecasts pertaining to times before or at this datetime are retaned.

Returns:

Tuple of datetimes containing the widest range of possible forecasted times

Return type:

Tuple[datetime, datetime]

nemseer.forecast_type.run_time_generators._generate_PREDISPATCH_runtimes(forecasted_start: datetime, forecasted_end: datetime) Tuple[datetime, datetime][source]#

Generates the earliest run_start and latest run_end for a set of user-supplied forecasted_start and forecasted_end times.

Calls validation function to ensure that user-supplied forecasted times are valid.

Parameters:
  • forecasted_start (datetime) – Forecasts pertaining to times at or after this datetime are retained.

  • forecasted_end (datetime) – Forecasts pertaining to times before or at this datetime are retaned.

Returns:

Tuple of datetimes containing the widest range of possible forecasted times

Return type:

Tuple[datetime, datetime]

nemseer.forecast_type.run_time_generators._generate_STPASA_runtimes(forecasted_start: datetime, forecasted_end: datetime) Tuple[datetime, datetime][source]#

Generates the earliest run_start and latest run_end for a set of user-supplied forecasted_start and forecasted_end times.

Calls validation function to ensure that user-supplied forecasted times are valid.

Parameters:
  • forecasted_start (datetime) – Forecasts pertaining to times at or after this datetime are retained.

  • forecasted_end (datetime) – Forecasts pertaining to times before or at this datetime are retaned.

Returns:

Tuple of datetimes containing the widest range of possible forecasted times

Return type:

Tuple[datetime, datetime]

Data#

nemseer.data.DATETIME_FORMAT = '%Y/%m/%d %H:%M'#

nemseer date format

nemseer.data.DEPRECATED_TABLES = {'MTPASA': ['CASESOLUTION']}#

Deprecated tables

nemseer.data.ENUMERATED_TABLES = {'P5MIN': [('CONSTRAINTSOLUTION', 4)], 'PREDISPATCH': [('CONSTRAINT', 2), ('LOAD', 2)]}#

Enumerated tables for each forecast type First element of tuple is table name Second element of tuple is number which to enumerate table to

nemseer.data.FORECASTED_COL = {'MTPASA': 'DAY', 'P5MIN': 'INTERVAL_DATETIME', 'PDPASA': 'INTERVAL_DATETIME', 'PREDISPATCH': 'DATETIME', 'STPASA': 'INTERVAL_DATETIME'}#

If it exists, nemseer uses the corresponding column for forecasted time filtering.

nemseer.data.FORECAST_TYPES = ('P5MIN', 'PREDISPATCH', 'PDPASA', 'STPASA', 'MTPASA')#

Forecast types requestable through nemseer. See also forecast types, and pre-dispatch and PASA.

nemseer.data.INVALID_STUBS_FILE = '.invalid_aemo_files.txt'#

File in raw_cache that contains invalid/corrupted AEMO files

nemseer.data.MMSDM_ARCHIVE_URL = 'http://www.nemweb.com.au/Data_Archive/Wholesale_Electricity/MMSDM/'#

Wholesale electricity data archive base URL

nemseer.data.MTPASA_DUID_URL = 'http://nemweb.com.au/Reports/Current/MTPASA_DUIDAvailability/'#

MTPASA DUID Availability

nemseer.data.PREDISP_ALL_DATA = ('CONSTRAINT', 'INTERCONNECTORRES', 'PRICE', 'LOAD', 'REGIONSUM')#

Tables which should be directed to the PREDISP_ALL_DATA URL. The corresponding tables in the DATA folder (which end with “_D”) only contain the latest forecasted value

nemseer.data.RUNTIME_COL = {'MTPASA': 'RUN_DATETIME', 'P5MIN': 'RUN_DATETIME', 'PDPASA': 'RUN_DATETIME', 'PREDISPATCH': 'PREDISPATCH_RUN_DATETIME', 'STPASA': 'RUN_DATETIME'}#

If it exists, nemseer will use the corresponding column for run time filtering.