eta_utility.timeseries.dataframes module

Simple helpers for reading timeseries data from a csv file and getting slices or resampled data. This module handles data using pandas dataframe objects.

eta_utility.timeseries.dataframes.df_from_csv(path: Path, *, delimiter: str = ';', infer_datetime_from: str | Sequence[int] | tuple[int, int] = 'dates', time_conversion_str: str = '%Y-%m-%d %H:%M') pd.DataFrame[source]

Take data from a csv file, process it and return a Timeseries (pandas Data Frame) object.

Open and read the .csv file, perform error checks and ensure that valid float values are obtained. This assumes that the first column is always the date and time column and provides multiple methods to convert this column. It also assumes that the first row is the header row. The header row is converted to lower case and spaces are converted to _. If header values contain special characters, everything starting from the first special character is discarded.

Parameters:
  • path – Path to the .csv file.

  • delimiter – Delimiter used between csv fields.

  • infer_datetime_from

    Specify how date and time values should be inferred. This can be ‘dates’ or ‘string’ or a tuple/list with two values.

    • If ‘dates’ is specified, pandas will be used to automatically infer the datetime format from the file.

    • If ‘string’ is specified, the parameter ‘time_conversion_str’ must specify the string (in python strptime format) to convert datetime values.

    • If a tuple/list of two values is given, the time format specification (according to python strptime format) will be read from the specified field in the .csv file (‘row’, ‘column’).

  • time_conversion_str – Time conversion string according to the python (strptime) format.

eta_utility.timeseries.dataframes.find_time_slice(time_begin: datetime, time_end: datetime | None = None, total_time: TimeStep | None = None, round_to_interval: TimeStep | None = None, random: bool | np.random.Generator = False) tuple[datetime, datetime][source]

Return a (potentially random) slicing interval that can be used to slice a data frame.

Parameters:
  • time_begin – Date and time of the beginning of the interval to slice from.

  • time_end – Date and time of the ending of the interval to slice from.

  • total_time – Specify the total time of the sliced interval. An integer will be interpreted as seconds. If this argument is None, the complete interval between beginning and end will be returned.

  • round_to_interval – Round times to a specified interval, this value is interpreted as seconds if given as an int. Default is no rounding.

  • random – If this value is true, or a random generator is supplied, it will be used to generate a random slice of length total_time in the interval between time_begin and time_end.

Returns:

Tuple of slice_begin time and slice_end time. Both times are datetime objects.

eta_utility.timeseries.dataframes.df_time_slice(df: pd.DataFrame, time_begin: datetime, time_end: datetime | None = None, total_time: TimeStep | None = None, round_to_interval: TimeStep | None = None, random: bool | np.random.Generator = False) pd.DataFrame[source]

Return a data frame which has been sliced starting at time_begin and ending at time_end, from df.

Parameters:
  • df – Original data frame to be sliced.

  • time_begin – Date and time of the beginning of the interval to slice from.

  • time_end – Date and time of the ending of the interval to slice from.

  • total_time – Specify the total time of the sliced interval. An integer will be interpreted as seconds. If this argument is None, the complete interval between beginning and end will be returned.

  • round_to_interval – Round times to a specified interval, this value is interpreted as seconds if given as an int. Default is no rounding.

  • random – If this value is true, or a random generator is supplied, it will be used to generate a random slice of length total_time in the interval between time_begin and time_end.

Returns:

Sliced data frame.

eta_utility.timeseries.dataframes.df_resample(dataframe: pd.DataFrame, *periods_deltas: TimeStep, missing_data: str | None = None) pd.DataFrame[source]

Resample the time index of a data frame. This method can be used for resampling in multiple different periods with multiple different deltas between single time entries.

Parameters:
  • df – DataFrame for processing.

  • periods_deltas – If one argument is specified, this will resample the data to the specified interval in seconds. If more than one argument is specified, they will be interpreted as (periods, interval) pairs. The first argument specifies a number of periods that should be resampled, the second value specifies the interval that these periods should be resampled to. A third argument would determine the next number of periods that should be resampled to the interval specified by the fourth argument and so on.

  • missing_data – Specify a method for handling missing data values. If this is not specified, missing data will not be handled. All missing data handling functions for pandas dataframes are valid. See also: https://pandas.pydata.org/docs/reference/frame.html#missing-data-handling. Some examples: ‘interpolate’, ‘ffill’ (default: asfreq).

Returns:

Copy of the DataFrame.

eta_utility.timeseries.dataframes.df_interpolate(dataframe: pd.DataFrame, freq: TimeStep, limit_direction: Literal['both', 'forward', 'backward'] = 'both') pd.DataFrame[source]

Interpolate missing values in a DataFrame with a specified frequency. Is able to handle unevenly spaced time series data.

Args:

dataframe (pd.DataFrame): DataFrame for interpolation. freq (TimeStep): Frequency of the resulting DataFrame. limit_direction (Literal[“both”, “forward”, “backward”], optional): Direction in which to limit the

interpolation. Defaults to “both”.

Returns:

pd.DataFrame: Interpolated DataFrame.