eta_utility.timeseries.scenarios module

eta_utility.timeseries.scenarios.scenario_from_csv(paths: Path | Sequence[Path], data_prefixes: Sequence[str] | None = None, *, start_time: datetime, end_time: datetime | None = None, total_time: TimeStep | None = None, random: np.random.Generator | bool | None = False, resample_time: TimeStep | None = None, interpolation_method: Sequence[str | None] | str | None = None, rename_cols: Mapping[str, str] | None = None, prefix_renamed: bool = True, infer_datetime_from: str | Sequence[Sequence[int]] | Sequence[str] = 'string', time_conversion_str: str | Sequence[str] = '%Y-%m-%d %H:%M', scaling_factors: Sequence[Mapping[str, SupportsFloat] | None] | Mapping[str, SupportsFloat] | None = None) pd.DataFrame[source]

Import (possibly multiple) scenario data files from csv files and return them as a single pandas data frame. The import function supports column renaming and will slice and resample data as specified.

Raises:

ValueError – If start and/or end times are outside the scope of the imported scenario files.

Note

The ValueError will only be raised when this is true for all files. If only one file is outside the range, an empty series will be returned for that file.

Parameters:
  • paths – Path(s) to one or more CSV data files. The paths should be fully qualified.

  • data_prefixes – If more than one file is imported, a list of data_prefixes must be supplied such that ambiguity of column names between the files can be avoided. There must be one prefix for every imported file, such that a distinct prefix can be prepended to all columns of a file.

  • start_time – Starting time for the scenario import.

  • end_time – Latest ending time for the scenario import (default: inferred from start_time and total_time).

  • total_time – Total duration of the imported scenario. If given as int this will be interpreted as seconds (default: inferred from start_time and end_time).

  • random – Set to true if a random starting point (within the interval determined by start_time and end_time) should be chosen. This will use the environments’ random generator.

  • resample_time – Resample the scenario data to the specified interval. If this is specified one of ‘upsample_fill’ or downsample_method’ must be supplied as well to determine how the new data points should be determined. If given as an int, this will be interpreted as seconds (default: no resampling).

  • interpolation_method – Method for interpolating missing data values. Pandas missing data handling methods are supported. If a list with one value per file is given, the specified method will be selected according to the order of paths.

  • rename_cols

    Rename columns of the imported data. Maps the columns as they appear in the data files to new names. Format: {old_name: new_name}.

    Note

    The column names are normalized to lowercase and underscores are added in place of spaces. Additionally, everything after the first symbol is removed. For example “Water Temperature #2” becomes “water_temperature”. So if you want to rename the column, you need to specify for example: {“water_temperature”: “T_W”}.

  • prefix_renamed – Should prefixes be applied to renamed columns as well? When setting this to false make sure that all columns in all loaded scenario files have different names. Otherwise, there is a risk of overwriting data.

  • infer_datetime_from – Specify how datetime values should be converted. ‘dates’ will use pandas to automatically determine the format. ‘string’ uses the conversion string specified in the ‘time_conversion_str’ parameter. If a two-tuple of the format (row, col) is given, data from the specified field in the data files will be used to determine the date format.

  • time_conversion_str – Time conversion string. This must be specified if the infer_datetime_from parameter is set to ‘string’. The string should specify the datetime format in the python strptime format.

  • scaling_factors – Scaling factors for each imported column.

Returns:

Imported and processed data as pandas.DataFrame.