Timeseries
Many eta_utility functions and classes operate on timeseries data and pandas.DataFrame
objects
containing timeseries data. The timeseries module in eta_utility provides some additional functionality for both.
It can for example find random time slices in Dataframes or import timeseries data from multiple CSV files and map
a (random if required) section of it into a Dataframe.
Scenario Data Loader
Scenario data is often required to perform optimizations and simulations of factory systems. The import function can import data from multiple files and returns a cleaned Dataframe.
- eta_utility.timeseries.scenario_from_csv(paths: Path | Sequence[Path], data_prefixes: Sequence[str] | None = None, *, start_time: datetime, end_time: datetime | None = None, total_time: TimeStep | None = None, random: np.random.Generator | bool | None = False, resample_time: TimeStep | None = None, interpolation_method: Sequence[str | None] | str | None = None, rename_cols: Mapping[str, str] | None = None, prefix_renamed: bool = True, infer_datetime_from: str | Sequence[Sequence[int]] | Sequence[str] = 'string', time_conversion_str: str | Sequence[str] = '%Y-%m-%d %H:%M', scaling_factors: Sequence[Mapping[str, SupportsFloat] | None] | Mapping[str, SupportsFloat] | None = None) pd.DataFrame [source]
Import (possibly multiple) scenario data files from csv files and return them as a single pandas data frame. The import function supports column renaming and will slice and resample data as specified.
- Raises:
ValueError – If start and/or end times are outside the scope of the imported scenario files.
Note
The ValueError will only be raised when this is true for all files. If only one file is outside the range, an empty series will be returned for that file.
- Parameters:
paths – Path(s) to one or more CSV data files. The paths should be fully qualified.
data_prefixes – If more than one file is imported, a list of data_prefixes must be supplied such that ambiguity of column names between the files can be avoided. There must be one prefix for every imported file, such that a distinct prefix can be prepended to all columns of a file.
start_time – Starting time for the scenario import.
end_time – Latest ending time for the scenario import (default: inferred from start_time and total_time).
total_time – Total duration of the imported scenario. If given as int this will be interpreted as seconds (default: inferred from start_time and end_time).
random – Set to true if a random starting point (within the interval determined by start_time and end_time) should be chosen. This will use the environments’ random generator.
resample_time – Resample the scenario data to the specified interval. If this is specified one of ‘upsample_fill’ or downsample_method’ must be supplied as well to determine how the new data points should be determined. If given as an int, this will be interpreted as seconds (default: no resampling).
interpolation_method – Method for interpolating missing data values. Pandas missing data handling methods are supported. If a list with one value per file is given, the specified method will be selected according to the order of paths.
rename_cols –
Rename columns of the imported data. Maps the columns as they appear in the data files to new names. Format: {old_name: new_name}.
Note
The column names are normalized to lowercase and underscores are added in place of spaces. Additionally, everything after the first symbol is removed. For example “Water Temperature #2” becomes “water_temperature”. So if you want to rename the column, you need to specify for example: {“water_temperature”: “T_W”}.
prefix_renamed – Should prefixes be applied to renamed columns as well? When setting this to false make sure that all columns in all loaded scenario files have different names. Otherwise, there is a risk of overwriting data.
infer_datetime_from – Specify how datetime values should be converted. ‘dates’ will use pandas to automatically determine the format. ‘string’ uses the conversion string specified in the ‘time_conversion_str’ parameter. If a two-tuple of the format (row, col) is given, data from the specified field in the data files will be used to determine the date format.
time_conversion_str – Time conversion string. This must be specified if the infer_datetime_from parameter is set to ‘string’. The string should specify the datetime format in the python strptime format.
scaling_factors – Scaling factors for each imported column.
- Returns:
Imported and processed data as pandas.DataFrame.
Extensions for pandas.DataFrame
Simple helpers for reading timeseries data from a csv file and getting slices or resampled data. This module handles data using pandas dataframe objects.
- eta_utility.timeseries.dataframes.df_from_csv(path: Path, *, delimiter: str = ';', infer_datetime_from: str | Sequence[int] | tuple[int, int] = 'dates', time_conversion_str: str = '%Y-%m-%d %H:%M') pd.DataFrame [source]
Take data from a csv file, process it and return a Timeseries (pandas Data Frame) object.
Open and read the .csv file, perform error checks and ensure that valid float values are obtained. This assumes that the first column is always the date and time column and provides multiple methods to convert this column. It also assumes that the first row is the header row. The header row is converted to lower case and spaces are converted to _. If header values contain special characters, everything starting from the first special character is discarded.
- Parameters:
path – Path to the .csv file.
delimiter – Delimiter used between csv fields.
infer_datetime_from –
Specify how date and time values should be inferred. This can be ‘dates’ or ‘string’ or a tuple/list with two values.
If ‘dates’ is specified, pandas will be used to automatically infer the datetime format from the file.
If ‘string’ is specified, the parameter ‘time_conversion_str’ must specify the string (in python strptime format) to convert datetime values.
If a tuple/list of two values is given, the time format specification (according to python strptime format) will be read from the specified field in the .csv file (‘row’, ‘column’).
time_conversion_str – Time conversion string according to the python (strptime) format.
- eta_utility.timeseries.dataframes.find_time_slice(time_begin: datetime, time_end: datetime | None = None, total_time: TimeStep | None = None, round_to_interval: TimeStep | None = None, random: bool | np.random.Generator = False) tuple[datetime, datetime] [source]
Return a (potentially random) slicing interval that can be used to slice a data frame.
- Parameters:
time_begin – Date and time of the beginning of the interval to slice from.
time_end – Date and time of the ending of the interval to slice from.
total_time – Specify the total time of the sliced interval. An integer will be interpreted as seconds. If this argument is None, the complete interval between beginning and end will be returned.
round_to_interval – Round times to a specified interval, this value is interpreted as seconds if given as an int. Default is no rounding.
random – If this value is true, or a random generator is supplied, it will be used to generate a random slice of length total_time in the interval between time_begin and time_end.
- Returns:
Tuple of slice_begin time and slice_end time. Both times are datetime objects.
- eta_utility.timeseries.dataframes.df_time_slice(df: pd.DataFrame, time_begin: datetime, time_end: datetime | None = None, total_time: TimeStep | None = None, round_to_interval: TimeStep | None = None, random: bool | np.random.Generator = False) pd.DataFrame [source]
Return a data frame which has been sliced starting at time_begin and ending at time_end, from df.
- Parameters:
df – Original data frame to be sliced.
time_begin – Date and time of the beginning of the interval to slice from.
time_end – Date and time of the ending of the interval to slice from.
total_time – Specify the total time of the sliced interval. An integer will be interpreted as seconds. If this argument is None, the complete interval between beginning and end will be returned.
round_to_interval – Round times to a specified interval, this value is interpreted as seconds if given as an int. Default is no rounding.
random – If this value is true, or a random generator is supplied, it will be used to generate a random slice of length total_time in the interval between time_begin and time_end.
- Returns:
Sliced data frame.
- eta_utility.timeseries.dataframes.df_resample(dataframe: pd.DataFrame, *periods_deltas: TimeStep, missing_data: str | None = None) pd.DataFrame [source]
Resample the time index of a data frame. This method can be used for resampling in multiple different periods with multiple different deltas between single time entries.
- Parameters:
df – DataFrame for processing.
periods_deltas – If one argument is specified, this will resample the data to the specified interval in seconds. If more than one argument is specified, they will be interpreted as (periods, interval) pairs. The first argument specifies a number of periods that should be resampled, the second value specifies the interval that these periods should be resampled to. A third argument would determine the next number of periods that should be resampled to the interval specified by the fourth argument and so on.
missing_data – Specify a method for handling missing data values. If this is not specified, missing data will not be handled. All missing data handling functions for pandas dataframes are valid. See also: https://pandas.pydata.org/docs/reference/frame.html#missing-data-handling. Some examples: ‘interpolate’, ‘ffill’ (default: asfreq).
- Returns:
Copy of the DataFrame.
- eta_utility.timeseries.dataframes.df_interpolate(dataframe: pd.DataFrame, freq: TimeStep, limit_direction: Literal['both', 'forward', 'backward'] = 'both') pd.DataFrame [source]
Interpolate missing values in a DataFrame with a specified frequency. Is able to handle unevenly spaced time series data.
- Args:
dataframe (pd.DataFrame): DataFrame for interpolation. freq (TimeStep): Frequency of the resulting DataFrame. limit_direction (Literal[“both”, “forward”, “backward”], optional): Direction in which to limit the
interpolation. Defaults to “both”.
- Returns:
pd.DataFrame: Interpolated DataFrame.