eta_utility.eta_x.envs.base_env module
- class eta_utility.eta_x.envs.base_env.BaseEnv(env_id: int, config_run: ConfigOptRun, verbose: int = 2, callback: Callable | None = None, state_modification_callback: Callable | None = None, *, scenario_time_begin: datetime | str, scenario_time_end: datetime | str, episode_duration: TimeStep | str, sampling_time: TimeStep | str, sim_steps_per_sample: int | str = 1, render_mode: str | None = None, **kwargs: Any)[source]
-
Abstract environment definition, providing some basic functionality for concrete environments to use. The class implements and adapts functions from gymnasium.Env. It provides additional functionality as required by the ETA-X framework and should be used as the starting point for new environments.
The initialization of this superclass performs many of the necessary tasks, required to specify a concrete environment. Read the documentation carefully to understand, how new environments can be developed, building on this starting point.
There are some attributes that must be set and some methods that must be implemented to satisfy the interface. This is required to create concrete environments. The required attributes are:
version: Version number of the environment.
description: Short description string of the environment.
action_space: The action space of the environment (see also gymnasium.spaces for options).
observation_space: The observation space of the environment (see also gymnasium.spaces for options).
The gymnasium interface requires the following methods for the environment to work correctly within the framework. Consult the documentation of each method for more detail.
step()
reset()
close()
- Parameters:
env_id – Identification for the environment, useful when creating multiple environments.
config_run – Configuration of the optimization run.
verbose – Verbosity to use for logging.
callback – callback that should be called after each episode.
state_modification_callback – callback that should be called after state setup, before logging the state.
scenario_time_begin – Beginning time of the scenario.
scenario_time_end – Ending time of the scenario.
episode_duration – Duration of the episode in seconds.
sampling_time – Duration of a single time sample / time step in seconds.
render_mode – Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.
kwargs – Other keyword arguments (for subclasses).
- config_run: ConfigOptRun
Information about the optimization run and information about the paths. For example, it defines path_results and path_scenarios.
- path_results: pathlib.Path
Path for storing results.
- path_scenarios: pathlib.Path | None
Path for the scenario data.
- path_env: pathlib.Path
Path of the environment file.
- state_modification_callback: Callable | None
Callback can be used for modifying the state at each time step.
- scenario_duration: float
Duration of the scenario for each episode (for total time imported from csv).
- scenario_time_begin: datetime
Beginning time of the scenario.
- scenario_time_end: datetime
Ending time of the scenario (should be in the format %Y-%m-%d %H:%M).
- timeseries: pd.DataFrame
The time series DataFrame contains all time series scenario data. It can be filled by the import_scenario method.
- ts_current: pd.DataFrame
Data frame containing the currently valid range of time series data.
- state_config: StateConfig | None
Configuration to describe what the environment state looks like.
- additional_state: dict[str, float] | None
Additional state information to append to the state during stepping and reset
- state_log_longtime: list[list[dict[str, float]]]
Log of the environment state over multiple episodes.
- data_log: list[dict[str, Any]]
Log of specific environment settings / other data, apart from state for the episode.
- data_log_longtime: list[list[dict[str, Any]]]
Log of specific environment settings / other data, apart from state, over multiple episodes.
- sim_steps_per_sample: int
Number of simulation steps to be taken for each sample. This must be a divisor of ‘sampling_time’.
- import_scenario(*scenario_paths: Mapping[str, Any], prefix_renamed: bool = True) pd.DataFrame [source]
Load data from csv into self.timeseries_data by using scenario_from_csv
- Parameters:
scenario_paths –
One or more scenario configuration dictionaries (or a list of dicts), which each contain a path for loading data from a scenario file. The dictionary should have the following structure, with <X> denoting the variable value:
Note
[{path: <X>, prefix: <X>, interpolation_method: <X>, resample_method: <X>, scale_factors: {col_name: <X>}, rename_cols: {col_name: <X>}, infer_datetime_cols: <X>, time_conversion_str: <X>}]
path: Path to the scenario file (relative to scenario_path).
prefix: Prefix for all columns in the file, useful if multiple imported files have the same column names.
interpolation_method: A pandas interpolation method, required if the frequency of values must be increased in comparison to the files’ data. (e.g.: ‘linear’ or ‘pad’).
scale_factors: Scaling factors for specific columns. This can be useful for example, if a column contains data in kilowatt and should be imported in watts. In this case, the scaling factor for the column would be 1000.
rename_cols: Mapping of column names from the file to new names for the imported data.
infer_datetime_cols: Number of the column which contains datetime data. If this value is not present, the time_conversion_str variable will be used to determine the datetime format.
time_conversion_str: Time conversion string, determining the datetime format used in the imported file (default: %Y-%m-%d %H:%M).
prefix_renamed – Determine whether the prefix is also applied to renamed columns.
- Returns:
Data Frame of the imported and formatted scenario data.
- get_scenario_state() dict[str, Any] [source]
Get scenario data for the current time step of the environment, as specified in state_config. This assumes that scenario data in self.ts_current is available and scaled correctly.
- Returns:
Scenario data for current time step.
- abstract step(action: np.ndarray) StepResult [source]
Perform one time step and return its results. This is called for every event or for every time step during the simulation/optimization run. It should utilize the actions as supplied by the agent to determine the new state of the environment. The method must return a five-tuple of observations, rewards, terminated, truncated, info.
Note
Do not forget to increment n_steps and n_steps_longtime.
- Parameters:
action – Actions taken by the agent.
- Returns:
The return value represents the state of the environment after the step was performed.
observations: A numpy array with new observation values as defined by the observation space. Observations is a np.array() (numpy array) with floating point or integer values.
reward: The value of the reward function. This is just one floating point value.
terminated: Boolean value specifying whether an episode has been completed. If this is set to true, the reset function will automatically be called by the agent or by eta_i.
truncated: Boolean, whether the truncation condition outside the scope is satisfied. Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the user needs to call the reset function.
info: Provide some additional info about the state of the environment. The contents of this may be used for logging purposes in the future but typically do not currently serve a purpose.
- reset(*, seed: int | None = None, options: dict[str, Any] | None = None) tuple[ObservationType, dict[str, Any]] [source]
Resets the environment to an initial internal state, returning an initial observation and info.
This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the
seed
parameter otherwise if the environment already has a random number generator andreset()
is called withseed=None
, the RNG is not reset. When using the environment in conjunction with stable_baselines3, the vectorized environment will take care of seeding your custom environment automatically.For Custom environments, the first line of
reset()
should besuper().reset(seed=seed)
which implements the seeding correctly.Note
Don’t forget to store and reset the episode_timer by calling self._reset_state() if you overwrite this function.
- Parameters:
seed – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and
seed=None
(the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG andseed=None
is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. (default: None)options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)
- Returns:
Tuple of observation and info. The observation of the initial state will be an element of
observation_space
(typically a numpy array) and is analogous to the observation returned bystep()
. Info is a dictionary containing auxiliary information complementingobservation
. It should be analogous to theinfo
returned bystep()
.
- abstract close() None [source]
Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.
- abstract render() None [source]
Render the environment
The set of supported modes varies per environment. Some environments do not support rendering at all. By convention in Farama gymnasium, if mode is:
human: render to the current display or terminal and return nothing. Usually for human consumption.
rgb_array: Return a numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).
- classmethod get_info() tuple[str, str] [source]
Get info about environment.
- Returns:
Tuple of version and description.
- export_state_log(path: Path, names: Sequence[str] | None = None, *, sep: str = ';', decimal: str = '.') None [source]
Extension of csv_export to include timeseries on the data
- Parameters:
names – Field names used when data is a Matrix without column names.
sep – Separator to use between the fields.
decimal – Sign to use for decimal points.