Environments

eta_utility environments are based on the interfaces offered by stable_baselines3 which are in turn based on the Farama gymnasium environments. The eta_x environments are provided as abstract classes which must be subclassed to create useful implementations. For the specific use cases they are intended for, these base classes make the creation of new environments much easier.

Custom environments should follow the interface for custom environments discussed in the stable_baselines3 documentation. The following describes the functions available to simplify implementation of specific functionality in custom environments. You can look at the Usage examples for some inspiration what custom environments can look like.

The custom environments created with the utilities described here can be used directly with stable_baselines3 or gymnasium. However, using the eta_utility.eta_x::ETAx class is recommended (see Introduction). When using the ETAx class for your optimization runs, the parameters required for environment instantiation must be configured in the environment_specific section of the configuration. If interaction between environments is also configured, additional parameters can be set in the configuration file. To configure the interaction environment, use the section interaction_env_specific. If that section is not present, the parameters from the environment_specific section will be used for both environments.

Environment State Configuration

The most important concept to understand when working with the environment utilities provided by eta_utility is is the handling and configuration of the environment state. The state is represented by eta_utility.eta_x.envs::StateVar objects which each correspond to one variable of the environment. All StateVar objects of an environment are combined into the StateConfig object. From the StateConfig object we can determine most other aspects of the environment, such as for example the observation space and action space. The gymnasium documentation provides more information about Spaces.

Each state variable is represented by a StateVar object:

class eta_utility.eta_x.envs.StateVar(name: str, *, is_agent_action=False, is_agent_observation=False, add_to_state_log=True, ext_id: str | int | None = None, is_ext_input=False, is_ext_output=False, ext_scale_add=0, ext_scale_mult=1, interact_id: int | None = None, from_interact=False, interact_scale_add=0, interact_scale_mult=1, scenario_id: str | None = None, from_scenario=False, scenario_scale_add=0, scenario_scale_mult=1, low_value=nan, high_value=nan, abort_condition_min=None, abort_condition_max=None, index=0)[source]

A variable in the state of an environment.

For example, the variable “tank_temperature” might be part of the environment’s state. Let’s assume it represents the temperature inside the tank of a cleaning machine. This variable could be read from an external source. In this case it must have is_ext_output = True and the name of the external variable to read from must be specified: ext_id = "T_Tank". If this value should also be passed to the agent as an observation, set is_agent_observation = True. For observations and actions, you also need to set the low and high values, which determine the size of the observation and action spaces in this case something like low_value = 20 and high_value = 80 (if we are talking about water temperature measured in Celsius) might make sense.

If you want the environment to safely abort the optimization when certain values are exceeded, set the abort conditions to sensible values such as abort_condition_min = 0 and abort_condition_max = 100. This can be especially useful for example if you have simulation models which do not support certain values (for example, in this case they might not be able to handle water temperatures higher than 100 °C):

v1 = StateVar(
    "tank_temperature",
    ext_id = "T_Tank",
    is_ext_output = True,
    is_agent_observation = True,
    low_value = 20,
    high_value = 80,
    abort_condition_min = 0,
    abort_condition_max = 100,
)

As another example, you could set up an agent action named name = "set_heater" which the environment uses to set the state of the tank heater. In this case, the state variable should be configured with is_agent_action = True and you might want to pass this on to a simulation model or an actual machine by setting is_ext_input = True:

v2 = StateVar(
    "set_heater",
    ext_id = "u_tank",
    is_ext_input = True,
    is_agent_action = True,
)

Finally, let’s create a third variable which is read from a scenario file and converted from kilowatts to watts (multiplied by 1000). Additionally, this variable needs to be offset by a value of -10 due to measurement errors:

v3 = StateVar(
    "outside_temperature",
    scenario_id = "T_ouside",
    scenario_scale_add = -10,
    scenario_scale_mult = 1000,
    is_agent_observation = True,
    low_value = 0,
    high_value = 40,
)
name: str

Name of the state variable (This must always be specified).

is_agent_action: bool

Should the agent specify actions for this variable? (default: False).

is_agent_observation: bool

Should the agent be allowed to observe the value of this variable? (default: False).

add_to_state_log: bool

Should the state log of this episode be added to state_log_longtime? (default: True).

ext_id: str | int | None

Name or identifier (order) of the variable in the external interaction model (e.g.: environment or FMU) (default: StateVar.name if (is_ext_input or is_ext_output) else None).

is_ext_input: bool

Should this variable be passed to the external model as an input? (default: False).

is_ext_output: bool

Should this variable be parsed from the external model output? (default: False).

ext_scale_add: float

Value to add to the output from an external model (default: 0).

ext_scale_mult: float

Value to multiply to the output from an external model (default: 1).

interact_id: int | None

Name or identifier (order) of the variable in an interaction environment (default: None).

from_interact: bool

Should this variable be read from the interaction environment? (default: False).

interact_scale_add: float

Value to add to the value read from an interaction (default: 0).

interact_scale_mult: float

Value to multiply to the value read from an interaction (default: 1).

scenario_id: str | None

Name of the scenario variable, this value should be read from (default: None).

from_scenario: bool

Should this variable be read from imported timeseries date? (default: False).

scenario_scale_add: float

Value to add to the value read from a scenario file (default: 0).

scenario_scale_mult: float

Value to multiply to the value read from a scenario file (default: 1).

low_value: float

Lowest possible value of the state variable (default: np.nan).

high_value: float

Highest possible value of the state variable (default: np.nan).

abort_condition_min: float | None

If the value of the variable dips below this, the episode should be aborted (default: None).

abort_condition_max: float | None

If the value of the variable rises above this, the episode should be aborted (default: None).

index: int

Determine the index, where to look (useful for mathematical optimization, where multiple time steps could be returned). In this case, the index values might be different for actions and observations.

All state variables are combined into the StateConfig object:

class eta_utility.eta_x.envs.StateConfig(*state_vars: StateVar)[source]

The configuration for the action and observation spaces. The values are used to control which variables are part of the action space and observation space. Additionally, the parameters can specify abort conditions and the handling of values from interaction environments or from simulation. Therefore, the StateConfig is very important for the functionality of ETA X.

Using the examples above, we could create the StateConfig object by passing our three state variables to the constructor:

state_config = StateConfig(v1, v2, v3)

If you are creating an environment, assign the StateConfig object to self.state_config. This will sometimes even be sufficient to create a fully functional environment.

vars: dict[str, StateVar]

Mapping of the variables names to their StateVar instance with all associated information.

actions: list[str]

List of variables that are agent actions.

observations: list[str]

List of variables that are agent observations.

add_to_state_log: set[str]

Set of variables that should be logged.

ext_inputs: list[str]

List of variables that should be provided to an external source (such as an FMU).

ext_outputs: list[str]

List of variables that can be received from an external source (such as an FMU).

map_ext_ids: dict[str, str | int]

Mapping of variable names to their external IDs.

rev_ext_ids: dict[str | int, str]

Reverse mapping of external IDs to their corresponding variable names.

ext_scale: dict[str, dict[str, float]]

Dictionary of scaling values for external input values (for example from simulations). Contains fields ‘add’ and ‘multiply’

interact_outputs: list[str]

List of variables that should be read from an interaction environment.

map_interact_ids: dict[str, int]

Mapping of internal environment names to interact IDs.

interact_scale: dict[str, dict[str, float]]

Dictionary of scaling values for interact values. Contains fields ‘add’ and ‘multiply’.

scenarios: list[str]

List of variables which are loaded from scenario files.

map_scenario_ids: dict[str, str]

Mapping of internal environment names to scenario IDs.

scenario_scale: dict[str, dict[str, float]]

Dictionary of scaling values for scenario values. Contains fields ‘add’ and ‘multiply’.

abort_conditions_min: list[str]

List of variables that have minimum values for an abort condition.

abort_conditions_max: list[str]

List of variables that have maximum values for an abort condition.

append_state(var: StateVar) None[source]

Append a state variable to the state configuration.

Parameters:

var – StateVar instance to append to the configuration.

store_file(file: Path) None[source]

Save the StateConfig to a comma separated file.

Parameters:

file – Path to the file.

within_abort_conditions(state: Mapping[str, float]) bool[source]

Check whether the given state is within the abort conditions specified by the StateConfig instance.

Parameters:

state – The state array to check for conformance.

Returns:

Result of the check (False if the state does not conform to the required conditions).

continuous_action_space() Box[source]

Generate an action space according to the format required by the OpenAI specification.

Returns:

Action space.

continuous_obs_space() Box[source]

Generate a continuous observation space according to the format required by the OpenAI specification.

Returns:

Observation Space.

continuous_observation_space() Box[source]
continuous_spaces() tuple[Box, Box][source]

Generate continuous action and observation spaces according to the OpenAI specification.

Returns:

Tuple of action space and observation space.

The state config object and its attributes (such as the observations) are used by the environments to determine which values to update during steps, which values to read from scenario files and which values to pass to the agent as actions.

Base Environment

class eta_utility.eta_x.envs.BaseEnv(env_id: int, config_run: ConfigOptRun, verbose: int = 2, callback: Callable | None = None, state_modification_callback: Callable | None = None, *, scenario_time_begin: datetime | str, scenario_time_end: datetime | str, episode_duration: TimeStep | str, sampling_time: TimeStep | str, sim_steps_per_sample: int | str = 1, render_mode: str | None = None, **kwargs: Any)[source]

Bases: Env, ABC

Abstract environment definition, providing some basic functionality for concrete environments to use. The class implements and adapts functions from gymnasium.Env. It provides additional functionality as required by the ETA-X framework and should be used as the starting point for new environments.

The initialization of this superclass performs many of the necessary tasks, required to specify a concrete environment. Read the documentation carefully to understand, how new environments can be developed, building on this starting point.

There are some attributes that must be set and some methods that must be implemented to satisfy the interface. This is required to create concrete environments. The required attributes are:

  • version: Version number of the environment.

  • description: Short description string of the environment.

  • action_space: The action space of the environment (see also gymnasium.spaces for options).

  • observation_space: The observation space of the environment (see also gymnasium.spaces for options).

The gymnasium interface requires the following methods for the environment to work correctly within the framework. Consult the documentation of each method for more detail.

  • step()

  • reset()

  • close()

Parameters:
  • env_id – Identification for the environment, useful when creating multiple environments.

  • config_run – Configuration of the optimization run.

  • verbose – Verbosity to use for logging.

  • callback – callback that should be called after each episode.

  • state_modification_callback – callback that should be called after state setup, before logging the state.

  • scenario_time_begin – Beginning time of the scenario.

  • scenario_time_end – Ending time of the scenario.

  • episode_duration – Duration of the episode in seconds.

  • sampling_time – Duration of a single time sample / time step in seconds.

  • render_mode – Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.

  • kwargs – Other keyword arguments (for subclasses).

abstract property version: str

Version of the environment

abstract property description: str

Long description of the environment

verbose: int

Verbosity level used for logging.

config_run: ConfigOptRun

Information about the optimization run and information about the paths. For example, it defines path_results and path_scenarios.

path_results: pathlib.Path

Path for storing results.

path_scenarios: pathlib.Path | None

Path for the scenario data.

path_env: pathlib.Path

Path of the environment file.

callback: Callable | None

Callback can be used for logging and plotting.

state_modification_callback: Callable | None

Callback can be used for modifying the state at each time step.

env_id: int

ID of the environment (useful for vectorized environments).

run_name: str

Name of the current optimization run.

n_episodes: int

Number of completed episodes.

n_steps: int

Current step of the model (number of completed steps) in the current episode.

n_steps_longtime: int

Current step of the model (total over all episodes).

render_mode: str | None = None

Render mode for rendering the environment

episode_duration: float

Duration of one episode in seconds.

sampling_time: float

Sampling time (interval between optimization time steps) in seconds.

n_episode_steps: int

Number of time steps (of width sampling_time) in each episode.

scenario_duration: float

Duration of the scenario for each episode (for total time imported from csv).

scenario_time_begin: datetime

Beginning time of the scenario.

scenario_time_end: datetime

Ending time of the scenario (should be in the format %Y-%m-%d %H:%M).

timeseries: pd.DataFrame

The time series DataFrame contains all time series scenario data. It can be filled by the import_scenario method.

ts_current: pd.DataFrame

Data frame containing the currently valid range of time series data.

state_config: StateConfig | None

Configuration to describe what the environment state looks like.

episode_timer: float

Episode timer (stores the start time of the episode).

state: dict[str, float]

Current state of the environment.

additional_state: dict[str, float] | None

Additional state information to append to the state during stepping and reset

state_log: list[dict[str, float]]

Log of the environment state.

state_log_longtime: list[list[dict[str, float]]]

Log of the environment state over multiple episodes.

data: dict[str, Any]

Some specific current environment settings / other data, apart from state.

data_log: list[dict[str, Any]]

Log of specific environment settings / other data, apart from state for the episode.

data_log_longtime: list[list[dict[str, Any]]]

Log of specific environment settings / other data, apart from state, over multiple episodes.

sim_steps_per_sample: int

Number of simulation steps to be taken for each sample. This must be a divisor of ‘sampling_time’.

import_scenario(*scenario_paths: Mapping[str, Any], prefix_renamed: bool = True) pd.DataFrame[source]

Load data from csv into self.timeseries_data by using scenario_from_csv

Parameters:
  • scenario_paths

    One or more scenario configuration dictionaries (or a list of dicts), which each contain a path for loading data from a scenario file. The dictionary should have the following structure, with <X> denoting the variable value:

    Note

    [{path: <X>, prefix: <X>, interpolation_method: <X>, resample_method: <X>, scale_factors: {col_name: <X>}, rename_cols: {col_name: <X>}, infer_datetime_cols: <X>, time_conversion_str: <X>}]

    • path: Path to the scenario file (relative to scenario_path).

    • prefix: Prefix for all columns in the file, useful if multiple imported files have the same column names.

    • interpolation_method: A pandas interpolation method, required if the frequency of values must be increased in comparison to the files’ data. (e.g.: ‘linear’ or ‘pad’).

    • scale_factors: Scaling factors for specific columns. This can be useful for example, if a column contains data in kilowatt and should be imported in watts. In this case, the scaling factor for the column would be 1000.

    • rename_cols: Mapping of column names from the file to new names for the imported data.

    • infer_datetime_cols: Number of the column which contains datetime data. If this value is not present, the time_conversion_str variable will be used to determine the datetime format.

    • time_conversion_str: Time conversion string, determining the datetime format used in the imported file (default: %Y-%m-%d %H:%M).

  • prefix_renamed – Determine whether the prefix is also applied to renamed columns.

Returns:

Data Frame of the imported and formatted scenario data.

get_scenario_state() dict[str, Any][source]

Get scenario data for the current time step of the environment, as specified in state_config. This assumes that scenario data in self.ts_current is available and scaled correctly.

Returns:

Scenario data for current time step.

abstract step(action: np.ndarray) StepResult[source]

Perform one time step and return its results. This is called for every event or for every time step during the simulation/optimization run. It should utilize the actions as supplied by the agent to determine the new state of the environment. The method must return a five-tuple of observations, rewards, terminated, truncated, info.

Note

Do not forget to increment n_steps and n_steps_longtime.

Parameters:

action – Actions taken by the agent.

Returns:

The return value represents the state of the environment after the step was performed.

  • observations: A numpy array with new observation values as defined by the observation space. Observations is a np.array() (numpy array) with floating point or integer values.

  • reward: The value of the reward function. This is just one floating point value.

  • terminated: Boolean value specifying whether an episode has been completed. If this is set to true, the reset function will automatically be called by the agent or by eta_i.

  • truncated: Boolean, whether the truncation condition outside the scope is satisfied. Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the user needs to call the reset function.

  • info: Provide some additional info about the state of the environment. The contents of this may be used for logging purposes in the future but typically do not currently serve a purpose.

_actions_valid(action: ndarray) None[source]

Check whether the actions are within the specified action space.

Parameters:

action – Actions taken by the agent.

Raise:

RuntimeError, when the actions are not inside of the action space.

_create_new_state(additional_state: dict[str, float] | None) None[source]

Take some initial values and create a new environment state object, stored in self.state.

Parameters:

additional_state – Values to initialize the state.

_actions_to_state(actions: ndarray) None[source]

Gather actions and store them in self.state.

Parameters:

actions – Actions taken by the agent.

_observations() ndarray[source]

Determine the observations list from environment state. This uses state_config to determine all observations.

Returns:

Observations for the agent as determined by state_config.

_done() bool[source]

Check if the episode is over or not using the number of steps (n_steps) and the total number of steps in an episode (n_episode_steps).

Returns:

boolean showing, whether the episode is done.

reset(*, seed: int | None = None, options: dict[str, Any] | None = None) tuple[ObservationType, dict[str, Any]][source]

Resets the environment to an initial internal state, returning an initial observation and info.

This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the seed parameter otherwise if the environment already has a random number generator and reset() is called with seed=None, the RNG is not reset. When using the environment in conjunction with stable_baselines3, the vectorized environment will take care of seeding your custom environment automatically.

For Custom environments, the first line of reset() should be super().reset(seed=seed) which implements the seeding correctly.

Note

Don’t forget to store and reset the episode_timer by calling self._reset_state() if you overwrite this function.

Parameters:
  • seed – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and seed=None (the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG and seed=None is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. (default: None)

  • options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)

Returns:

Tuple of observation and info. The observation of the initial state will be an element of observation_space (typically a numpy array) and is analogous to the observation returned by step(). Info is a dictionary containing auxiliary information complementing observation. It should be analogous to the info returned by step().

_reduce_state_log() list[dict[str, float]][source]

Removes unwanted parameters from state_log before storing in state_log_longtime

Returns:

The return value is a list of dictionaries, where the parameters that should not be stored were removed

_is_protocol = False
_np_random: np.random.Generator | None = None
_reset_state() None[source]

Store episode statistics and reset episode counters.

get_wrapper_attr(name: str) Any

Gets the attribute name from the environment.

property np_random: numpy.random.Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:

Instances of np.random.Generator

property unwrapped: Env[ObsType, ActType]

Returns the base non-wrapped environment.

Returns:

Env: The base non-wrapped gymnasium.Env instance

action_space: spaces.Space[ActType]
observation_space: spaces.Space[ObsType]
abstract close() None[source]

Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.

abstract render() None[source]

Render the environment

The set of supported modes varies per environment. Some environments do not support rendering at all. By convention in Farama gymnasium, if mode is:

  • human: render to the current display or terminal and return nothing. Usually for human consumption.

  • rgb_array: Return a numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.

  • ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

classmethod get_info() tuple[str, str][source]

Get info about environment.

Returns:

Tuple of version and description.

export_state_log(path: Path, names: Sequence[str] | None = None, *, sep: str = ';', decimal: str = '.') None[source]

Extension of csv_export to include timeseries on the data

Parameters:
  • names – Field names used when data is a Matrix without column names.

  • sep – Separator to use between the fields.

  • decimal – Sign to use for decimal points.

Model Predictive Control (MPC) Environment

The BaseEnvMPC is a class for the optimization of mathematical MPC models.

class eta_utility.eta_x.envs.BaseEnvMPC(env_id: int, config_run: ConfigOptRun, verbose: int = 2, callback: Callable | None = None, *, scenario_time_begin: datetime | str, scenario_time_end: datetime | str, episode_duration: TimeStep | str, sampling_time: TimeStep | str, model_parameters: Mapping[str, Any], prediction_scope: TimeStep | str | None = None, render_mode: str | None = None, **kwargs: Any)[source]

Bases: BaseEnv, ABC

Base class for mathematical MPC models. This class can be used in conjunction with the MathSolver agent. You need to implement the _model method in a subclass and return a pyomo.AbstractModel from it.

Parameters:
  • env_id – Identification for the environment, useful when creating multiple environments.

  • config_run – Configuration of the optimization run.

  • verbose – Verbosity to use for logging.

  • callback – callback which should be called after each episode.

  • scenario_time_begin – Beginning time of the scenario.

  • scenario_time_end – Ending time of the scenario.

  • episode_duration – Duration of the episode in seconds.

  • sampling_time – Duration of a single time sample / time step in seconds.

  • model_parameters – Parameters for the mathematical model.

  • prediction_scope – Duration of the prediction (usually a subsample of the episode duration).

  • render_mode – Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.

  • kwargs – Other keyword arguments (for subclasses).

prediction_scope: float

Total duration of one prediction/optimization run when used with the MPC agent. This is automatically set to the value of episode_duration if it is not supplied separately.

n_prediction_steps: int

Number of steps in the prediction (prediction_scope/sampling_time).

scenario_duration: float

Duration of the scenario for each episode (for total time imported from csv).

model_parameters

Configuration for the MILP model parameters.

_concrete_model: pyo.ConcreteModel | None

Concrete pyomo model as initialized by _model.

time_var: str | None

Name of the “time” variable/set in the model (i.e. “T”). This is if the pyomo sets must be re-indexed when updating the model between time steps. If this is None, it is assumed that no reindexing of the timeseries data is required during updates - this is the default.

nonindex_update_append_string: str | None

Updating indexed model parameters can be achieved either by updating only the first value of the actual parameter itself or by having a separate handover parameter that is used for specifying only the first value. The separate handover parameter can be denoted with an appended string. For example, if the actual parameter is x.ON then the handover parameter could be x.ON_first. To use x.ON_first for updates, set the nonindex_update_append_string to “_first”. If the attribute is set to None, the first value of the actual parameter (x.ON) would be updated instead.

_use_model_time_increments: bool

Some models may not use the actual time increment (sampling_time). Instead, they would translate into model time increments (each sampling time increment equals a single model time step). This means that indices of the model components simply count 1,2,3,… instead of 0, sampling_time, 2*sampling_time, … Set this to true, if model time increments (1, 2, 3, …) are used. Otherwise, sampling_time will be used as the time increment. Note: This is only relevant for the first model time increment, later increments may differ.

property model: tuple[ConcreteModel, list]

The model property is a tuple of the concrete model and the order of the action space. This is used such that the MPC algorithm can re-sort the action output. This sorting cannot be conveyed differently through pyomo.

Returns:

Tuple of the concrete model and the order of the action space.

abstract _model() AbstractModel[source]

Create the abstract pyomo model. This is where the pyomo model description should be placed.

Returns:

Abstract pyomo model.

step(action: np.ndarray) StepResult[source]

Perform one time step and return its results. This is called for every event or for every time step during the simulation/optimization run. It should utilize the actions as supplied by the agent to determine the new state of the environment. The method must return a five-tuple of observations, rewards, terminated, truncated, info.

This also updates self.state and self.state_log to store current state information.

Note

This function always returns 0 reward. Therefore, it must be extended if it is to be used with reinforcement learning agents. If you need to manipulate actions (discretization, policy shaping, …) do this before calling this function. If you need to manipulate observations and rewards, do this after calling this function.

Parameters:

action (np.ndarray) – Actions to perform in the environment.

Returns:

The return value represents the state of the environment after the step was performed.

  • observations: A numpy array with new observation values as defined by the observation space. Observations is a np.array() (numpy array) with floating point or integer values.

  • reward: The value of the reward function. This is just one floating point value.

  • terminated: Boolean value specifying whether an episode has been completed. If this is set to true, the reset function will automatically be called by the agent or by eta_i.

  • truncated: Boolean, whether the truncation condition outside the scope is satisfied.

  • truncated: Boolean, whether the truncation condition outside the scope is satisfied. Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the user needs to call the reset function.

  • info: Provide some additional info about the state of the environment. The contents of this may be used for logging purposes in the future but typically do not currently serve a purpose.

update(observations: Sequence[Sequence[float | int]] | None = None) np.ndarray[source]

Update the optimization model with observations from another environment.

Parameters:

observations – Observations from another environment.

Returns:

Full array of current observations.

solve_failed(model: pyo.ConcreteModel, result: SolverResults) None[source]

This method will try to render the result in case the model could not be solved. It should automatically be called by the agent.

Parameters:
  • model – Current model.

  • result – Result of the last solution attempt.

reset(*, seed: int | None = None, options: dict[str, Any] | None = None) tuple[np.ndarray, dict[str, Any]][source]

Resets the environment to an initial internal state, returning an initial observation and info.

This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the seed parameter otherwise if the environment already has a random number generator and reset() is called with seed=None, the RNG is not reset. When using the environment in conjunction with stable_baselines3, the vectorized environment will take care of seeding your custom environment automatically.

For Custom environments, the first line of reset() should be super().reset(seed=seed) which implements the seeding correctly.

Note

Don’t forget to store and reset the episode_timer.

Parameters:
  • seed – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and seed=None (the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG and seed=None is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. (default: None)

  • options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)

Returns:

Tuple of observation and info. The observation of the initial state will be an element of observation_space (typically a numpy array) and is analogous to the observation returned by step(). Info is a dictionary containing auxiliary information complementing observation. It should be analogous to the info returned by step().

close() None[source]

Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.

Default behavior for the MPC environment is to do nothing.

pyo_component_params(component_name: None | str, ts: pd.DataFrame | pd.Series | dict[str, dict] | Sequence | None = None, index: pd.Index | Sequence | pyo.Set | None = None) PyoParams[source]

Retrieve parameters for the named component and convert the parameters into the pyomo dict-format. If required, timeseries can be added to the parameters and timeseries may be reindexed. The pyo_convert_timeseries function is used for timeseries handling. See also pyo_convert_timeseries

Parameters:
  • component_name – Name of the component.

  • ts – Timeseries for the component.

  • index – New index for timeseries data. If this is supplied, all timeseries will be copied and reindexed.

Returns:

Pyomo parameter dictionary.

static pyo_convert_timeseries(ts: pd.DataFrame | pd.Series | dict[str | None, dict[str, Any] | Any] | Sequence, index: pd.Index | Sequence | pyo.Set | None = None, component_name: str | None = None, *, _add_wrapping_none: bool = True) PyoParams[source]

Convert a time series data into a pyomo format. Data will be reindexed if a new index is provided.

Parameters:
  • ts – Timeseries to convert.

  • index – New index for timeseries data. If this is supplied, all timeseries will be copied and reindexed.

  • component_name – Name of a specific component that the timeseries is used for. This limits which timeseries are returned.

  • _add_wrapping_none – Add a “None” indexed dictionary as the top level.

Returns:

Pyomo parameter dictionary.

_actions_to_state(actions: ndarray) None

Gather actions and store them in self.state.

Parameters:

actions – Actions taken by the agent.

_actions_valid(action: ndarray) None

Check whether the actions are within the specified action space.

Parameters:

action – Actions taken by the agent.

Raise:

RuntimeError, when the actions are not inside of the action space.

_create_new_state(additional_state: dict[str, float] | None) None

Take some initial values and create a new environment state object, stored in self.state.

Parameters:

additional_state – Values to initialize the state.

_done() bool

Check if the episode is over or not using the number of steps (n_steps) and the total number of steps in an episode (n_episode_steps).

Returns:

boolean showing, whether the episode is done.

_is_protocol = False
_np_random: np.random.Generator | None = None
_observations() ndarray

Determine the observations list from environment state. This uses state_config to determine all observations.

Returns:

Observations for the agent as determined by state_config.

_reduce_state_log() list[dict[str, float]]

Removes unwanted parameters from state_log before storing in state_log_longtime

Returns:

The return value is a list of dictionaries, where the parameters that should not be stored were removed

_reset_state() None

Store episode statistics and reset episode counters.

abstract property description: str

Long description of the environment

export_state_log(path: Path, names: Sequence[str] | None = None, *, sep: str = ';', decimal: str = '.') None

Extension of csv_export to include timeseries on the data

Parameters:
  • names – Field names used when data is a Matrix without column names.

  • sep – Separator to use between the fields.

  • decimal – Sign to use for decimal points.

classmethod get_info() tuple[str, str]

Get info about environment.

Returns:

Tuple of version and description.

get_scenario_state() dict[str, Any]

Get scenario data for the current time step of the environment, as specified in state_config. This assumes that scenario data in self.ts_current is available and scaled correctly.

Returns:

Scenario data for current time step.

get_wrapper_attr(name: str) Any

Gets the attribute name from the environment.

import_scenario(*scenario_paths: Mapping[str, Any], prefix_renamed: bool = True) pd.DataFrame

Load data from csv into self.timeseries_data by using scenario_from_csv

Parameters:
  • scenario_paths

    One or more scenario configuration dictionaries (or a list of dicts), which each contain a path for loading data from a scenario file. The dictionary should have the following structure, with <X> denoting the variable value:

    Note

    [{path: <X>, prefix: <X>, interpolation_method: <X>, resample_method: <X>, scale_factors: {col_name: <X>}, rename_cols: {col_name: <X>}, infer_datetime_cols: <X>, time_conversion_str: <X>}]

    • path: Path to the scenario file (relative to scenario_path).

    • prefix: Prefix for all columns in the file, useful if multiple imported files have the same column names.

    • interpolation_method: A pandas interpolation method, required if the frequency of values must be increased in comparison to the files’ data. (e.g.: ‘linear’ or ‘pad’).

    • scale_factors: Scaling factors for specific columns. This can be useful for example, if a column contains data in kilowatt and should be imported in watts. In this case, the scaling factor for the column would be 1000.

    • rename_cols: Mapping of column names from the file to new names for the imported data.

    • infer_datetime_cols: Number of the column which contains datetime data. If this value is not present, the time_conversion_str variable will be used to determine the datetime format.

    • time_conversion_str: Time conversion string, determining the datetime format used in the imported file (default: %Y-%m-%d %H:%M).

  • prefix_renamed – Determine whether the prefix is also applied to renamed columns.

Returns:

Data Frame of the imported and formatted scenario data.

property np_random: numpy.random.Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:

Instances of np.random.Generator

pyo_update_params(updated_params: MutableMapping[str | None, Any], nonindex_param_append_string: str | None = None) None[source]

Updates model parameters and indexed parameters of a pyomo instance with values given in a dictionary. It assumes that the dictionary supplied in updated_params has the correct pyomo format.

Parameters:
  • updated_params – Dictionary with the updated values.

  • nonindex_param_append_string – String to be appended to values which are not indexed. This can be used if indexed parameters need to be set with values that do not have an index.

Returns:

Updated model instance.

abstract render() None

Render the environment

The set of supported modes varies per environment. Some environments do not support rendering at all. By convention in Farama gymnasium, if mode is:

  • human: render to the current display or terminal and return nothing. Usually for human consumption.

  • rgb_array: Return a numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.

  • ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

render_mode: str | None = None

Render mode for rendering the environment

property unwrapped: Env[ObsType, ActType]

Returns the base non-wrapped environment.

Returns:

Env: The base non-wrapped gymnasium.Env instance

abstract property version: str

Version of the environment

verbose: int

Verbosity level used for logging.

config_run: ConfigOptRun

Information about the optimization run and information about the paths. For example, it defines path_results and path_scenarios.

path_results: pathlib.Path

Path for storing results.

path_scenarios: pathlib.Path | None

Path for the scenario data.

path_env: pathlib.Path

Path of the environment file.

callback: Callable | None

Callback can be used for logging and plotting.

state_modification_callback: Callable | None

Callback can be used for modifying the state at each time step.

env_id: int

ID of the environment (useful for vectorized environments).

run_name: str

Name of the current optimization run.

n_episodes: int

Number of completed episodes.

n_steps: int

Current step of the model (number of completed steps) in the current episode.

n_steps_longtime: int

Current step of the model (total over all episodes).

episode_duration: float

Duration of one episode in seconds.

sampling_time: float

Sampling time (interval between optimization time steps) in seconds.

n_episode_steps: int

Number of time steps (of width sampling_time) in each episode.

scenario_time_begin: datetime

Beginning time of the scenario.

scenario_time_end: datetime

Ending time of the scenario (should be in the format %Y-%m-%d %H:%M).

timeseries: pd.DataFrame

The time series DataFrame contains all time series scenario data. It can be filled by the import_scenario method.

ts_current: pd.DataFrame

Data frame containing the currently valid range of time series data.

state_config: StateConfig | None

Configuration to describe what the environment state looks like.

episode_timer: float

Episode timer (stores the start time of the episode).

state: dict[str, float]

Current state of the environment.

additional_state: dict[str, float] | None

Additional state information to append to the state during stepping and reset

state_log: list[dict[str, float]]

Log of the environment state.

state_log_longtime: list[list[dict[str, float]]]

Log of the environment state over multiple episodes.

data: dict[str, Any]

Some specific current environment settings / other data, apart from state.

data_log: list[dict[str, Any]]

Log of specific environment settings / other data, apart from state for the episode.

data_log_longtime: list[list[dict[str, Any]]]

Log of specific environment settings / other data, apart from state, over multiple episodes.

sim_steps_per_sample: int

Number of simulation steps to be taken for each sample. This must be a divisor of ‘sampling_time’.

action_space: spaces.Space[ActType]
observation_space: spaces.Space[ObsType]
pyo_get_solution(names: set[str] | None = None) dict[str, float | int | dict[int, float | int]][source]

Convert the pyomo solution into a more usable format for plotting.

Parameters:

names – Names of the model parameters that are returned.

Returns:

Dictionary of {parameter name: value} pairs. Value may be a dictionary of {time: value} pairs which contains one value for each optimization time step.

pyo_get_component_value(component: pyo.Component, at: int = 1, allow_stale: bool = False) float | int | None[source]

Simulation (FMU) Environment

The BaseEnvSim supports the optimization of FMU simulation models. Make sure to set the fmu_name attribute when subclassing this environment. The FMU file will be loaded from the same directory as the environment itself.

class eta_utility.eta_x.envs.BaseEnvSim(env_id: int, config_run: ConfigOptRun, verbose: int = 2, callback: Callable | None = None, *, scenario_time_begin: datetime | str, scenario_time_end: datetime | str, episode_duration: TimeStep | str, sampling_time: TimeStep | str, model_parameters: Mapping[str, Any] | None = None, sim_steps_per_sample: int | str = 1, render_mode: str | None = None, **kwargs: Any)[source]

Bases: BaseEnv, ABC

Base class for FMU Simulation models environments.

Parameters:
  • env_id – Identification for the environment, useful when creating multiple environments.

  • config_run – Configuration of the optimization run.

  • verbose – Verbosity to use for logging.

  • callback – callback which should be called after each episode.

  • scenario_time_begin – Beginning time of the scenario.

  • scneario_time_end – Ending time of the scenario.

  • episode_duration – Duration of the episode in seconds.

  • sampling_time – Duration of a single time sample / time step in seconds.

  • model_parameters – Parameters for the mathematical model.

  • sim_steps_per_sample – Number of simulation steps to perform during every sample.

  • render_mode – Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.

  • kwargs – Other keyword arguments (for subclasses).

abstract property fmu_name: str

Name of the FMU file

sim_steps_per_sample: int

Number of simulation steps to be taken for each sample. This must be a divisor of ‘sampling_time’.

path_fmu: pathlib.Path

The FMU is expected to be placed in the same folder as the environment

model_parameters: Mapping[str, int | float] | None

Configuration for the FMU model parameters, that need to be set for initialization of the Model.

simulator: FMUSimulator

Instance of the FMU. This can be used to directly access the eta_utility.FMUSimulator interface.

_init_simulator(init_values: Mapping[str, int | float] | None = None) None[source]

Initialize the simulator object. Make sure to call _names_from_state before this or to otherwise initialize the names array.

This can also be used to reset the simulator after an episode is completed. It will reuse the same simulator object and reset it to the given initial values.

Parameters:

init_values – Dictionary of initial values for some FMU variables.

simulate(state: Mapping[str, float]) tuple[dict[str, float], bool, float][source]

Perform a simulator step and return data as specified by the is_ext_observation parameter of the state_config.

Parameters:

state – State of the environment before the simulation.

Returns:

Output of the simulation, boolean showing whether all simulation steps where successful, time elapsed during simulation.

step(action: np.ndarray) StepResult[source]

Perform one time step and return its results. This is called for every event or for every time step during the simulation/optimization run. It should utilize the actions as supplied by the agent to determine the new state of the environment. The method must return a five-tuple of observations, rewards, terminated, truncated, info.

This also updates self.state and self.state_log to store current state information.

Note

This function always returns 0 reward. Therefore, it must be extended if it is to be used with reinforcement learning agents. If you need to manipulate actions (discretization, policy shaping, …) do this before calling this function. If you need to manipulate observations and rewards, do this after calling this function.

Parameters:

action – Actions to perform in the environment.

Returns:

The return value represents the state of the environment after the step was performed.

  • observations: A numpy array with new observation values as defined by the observation space. Observations is a np.array() (numpy array) with floating point or integer values.

  • reward: The value of the reward function. This is just one floating point value.

  • terminated: Boolean value specifying whether an episode has been completed. If this is set to true, the reset function will automatically be called by the agent or by eta_i.

  • truncated: Boolean, whether the truncation condition outside the scope is satisfied. Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the user needs to call the reset function.

  • info: Provide some additional info about the state of the environment. The contents of this may be used for logging purposes in the future but typically do not currently serve a purpose.

_update_state(action: ndarray) tuple[bool, float][source]

Take additional_state, execute simulation and get state information from scenario. This function updates self.state and increments the step counter.

Warning

You have to update self.state_log with the entire state before leaving the step to store the state information.

Parameters:

action – Actions to perform in the environment.

Returns:

Success of the simulation, time taken for simulation.

reset(*, seed: int | None = None, options: dict[str, Any] | None = None) tuple[ObservationType, dict[str, Any]][source]

Resets the environment to an initial internal state, returning an initial observation and info.

This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the seed parameter otherwise if the environment already has a random number generator and reset() is called with seed=None, the RNG is not reset. When using the environment in conjunction with stable_baselines3, the vectorized environment will take care of seeding your custom environment automatically.

For Custom environments, the first line of reset() should be super().reset(seed=seed) which implements the seeding correctly.

Note

Don’t forget to store and reset the episode_timer.

Parameters:
  • seed – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and seed=None (the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG and seed=None is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. (default: None)

  • options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)

Returns:

Tuple of observation and info. The observation of the initial state will be an element of observation_space (typically a numpy array) and is analogous to the observation returned by step(). Info is a dictionary containing auxiliary information complementing observation. It should be analogous to the info returned by step().

close() None[source]

Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.

Default behavior for the Simulation environment is to close the FMU object.

_actions_to_state(actions: ndarray) None

Gather actions and store them in self.state.

Parameters:

actions – Actions taken by the agent.

_actions_valid(action: ndarray) None

Check whether the actions are within the specified action space.

Parameters:

action – Actions taken by the agent.

Raise:

RuntimeError, when the actions are not inside of the action space.

_create_new_state(additional_state: dict[str, float] | None) None

Take some initial values and create a new environment state object, stored in self.state.

Parameters:

additional_state – Values to initialize the state.

_done() bool

Check if the episode is over or not using the number of steps (n_steps) and the total number of steps in an episode (n_episode_steps).

Returns:

boolean showing, whether the episode is done.

_is_protocol = False
_np_random: np.random.Generator | None = None
_observations() ndarray

Determine the observations list from environment state. This uses state_config to determine all observations.

Returns:

Observations for the agent as determined by state_config.

_reduce_state_log() list[dict[str, float]]

Removes unwanted parameters from state_log before storing in state_log_longtime

Returns:

The return value is a list of dictionaries, where the parameters that should not be stored were removed

_reset_state() None

Store episode statistics and reset episode counters.

abstract property description: str

Long description of the environment

export_state_log(path: Path, names: Sequence[str] | None = None, *, sep: str = ';', decimal: str = '.') None

Extension of csv_export to include timeseries on the data

Parameters:
  • names – Field names used when data is a Matrix without column names.

  • sep – Separator to use between the fields.

  • decimal – Sign to use for decimal points.

classmethod get_info() tuple[str, str]

Get info about environment.

Returns:

Tuple of version and description.

get_scenario_state() dict[str, Any]

Get scenario data for the current time step of the environment, as specified in state_config. This assumes that scenario data in self.ts_current is available and scaled correctly.

Returns:

Scenario data for current time step.

get_wrapper_attr(name: str) Any

Gets the attribute name from the environment.

import_scenario(*scenario_paths: Mapping[str, Any], prefix_renamed: bool = True) pd.DataFrame

Load data from csv into self.timeseries_data by using scenario_from_csv

Parameters:
  • scenario_paths

    One or more scenario configuration dictionaries (or a list of dicts), which each contain a path for loading data from a scenario file. The dictionary should have the following structure, with <X> denoting the variable value:

    Note

    [{path: <X>, prefix: <X>, interpolation_method: <X>, resample_method: <X>, scale_factors: {col_name: <X>}, rename_cols: {col_name: <X>}, infer_datetime_cols: <X>, time_conversion_str: <X>}]

    • path: Path to the scenario file (relative to scenario_path).

    • prefix: Prefix for all columns in the file, useful if multiple imported files have the same column names.

    • interpolation_method: A pandas interpolation method, required if the frequency of values must be increased in comparison to the files’ data. (e.g.: ‘linear’ or ‘pad’).

    • scale_factors: Scaling factors for specific columns. This can be useful for example, if a column contains data in kilowatt and should be imported in watts. In this case, the scaling factor for the column would be 1000.

    • rename_cols: Mapping of column names from the file to new names for the imported data.

    • infer_datetime_cols: Number of the column which contains datetime data. If this value is not present, the time_conversion_str variable will be used to determine the datetime format.

    • time_conversion_str: Time conversion string, determining the datetime format used in the imported file (default: %Y-%m-%d %H:%M).

  • prefix_renamed – Determine whether the prefix is also applied to renamed columns.

Returns:

Data Frame of the imported and formatted scenario data.

property np_random: numpy.random.Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:

Instances of np.random.Generator

abstract render() None

Render the environment

The set of supported modes varies per environment. Some environments do not support rendering at all. By convention in Farama gymnasium, if mode is:

  • human: render to the current display or terminal and return nothing. Usually for human consumption.

  • rgb_array: Return a numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.

  • ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

render_mode: str | None = None

Render mode for rendering the environment

property unwrapped: Env[ObsType, ActType]

Returns the base non-wrapped environment.

Returns:

Env: The base non-wrapped gymnasium.Env instance

abstract property version: str

Version of the environment

verbose: int

Verbosity level used for logging.

config_run: ConfigOptRun

Information about the optimization run and information about the paths. For example, it defines path_results and path_scenarios.

path_results: pathlib.Path

Path for storing results.

path_scenarios: pathlib.Path | None

Path for the scenario data.

path_env: pathlib.Path

Path of the environment file.

callback: Callable | None

Callback can be used for logging and plotting.

state_modification_callback: Callable | None

Callback can be used for modifying the state at each time step.

env_id: int

ID of the environment (useful for vectorized environments).

run_name: str

Name of the current optimization run.

n_episodes: int

Number of completed episodes.

n_steps: int

Current step of the model (number of completed steps) in the current episode.

n_steps_longtime: int

Current step of the model (total over all episodes).

episode_duration: float

Duration of one episode in seconds.

sampling_time: float

Sampling time (interval between optimization time steps) in seconds.

n_episode_steps: int

Number of time steps (of width sampling_time) in each episode.

scenario_duration: float

Duration of the scenario for each episode (for total time imported from csv).

scenario_time_begin: datetime

Beginning time of the scenario.

scenario_time_end: datetime

Ending time of the scenario (should be in the format %Y-%m-%d %H:%M).

timeseries: pd.DataFrame

The time series DataFrame contains all time series scenario data. It can be filled by the import_scenario method.

ts_current: pd.DataFrame

Data frame containing the currently valid range of time series data.

state_config: StateConfig | None

Configuration to describe what the environment state looks like.

episode_timer: float

Episode timer (stores the start time of the episode).

state: dict[str, float]

Current state of the environment.

additional_state: dict[str, float] | None

Additional state information to append to the state during stepping and reset

state_log: list[dict[str, float]]

Log of the environment state.

state_log_longtime: list[list[dict[str, float]]]

Log of the environment state over multiple episodes.

data: dict[str, Any]

Some specific current environment settings / other data, apart from state.

data_log: list[dict[str, Any]]

Log of specific environment settings / other data, apart from state for the episode.

data_log_longtime: list[list[dict[str, Any]]]

Log of specific environment settings / other data, apart from state, over multiple episodes.

action_space: spaces.Space[ActType]
observation_space: spaces.Space[ObsType]

Live Connection Environment

The BaseEnvLive is an environment which create direct (live) connections to actual devices. It utilizes eta_utility.connectors.LiveConnect to achieve this. Please also read the corresponding documentation because LiveConnect needs additional configuration.

class eta_utility.eta_x.envs.BaseEnvLive(env_id: int, config_run: ConfigOptRun, verbose: int = 2, callback: Callable | None = None, *, scenario_time_begin: datetime | str, scenario_time_end: datetime | str, episode_duration: TimeStep | str, sampling_time: TimeStep | str, max_errors: int = 10, render_mode: str | None = None, **kwargs: Any)[source]

Bases: BaseEnv, ABC

Base class for Live Connector environments. The class will prepare the initialization of the LiveConnect class and provide facilities to automatically read step results and reset the connection.

Parameters:
  • env_id – Identification for the environment, useful when creating multiple environments.

  • config_run – Configuration of the optimization run.

  • verbose – Verbosity to use for logging.

  • callback – callback which should be called after each episode.

  • scenario_time_begin – Beginning time of the scenario.

  • scenario_time_end – Ending time of the scenario.

  • episode_duration – Duration of the episode in seconds.

  • sampling_time – Duration of a single time sample / time step in seconds.

  • max_errors – Maximum number of connection errors before interrupting the optimization process.

  • render_mode – Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.

  • kwargs – Other keyword arguments (for subclasses).

abstract property config_name: str

Name of the live_connect configuration

live_connector: LiveConnect

Instance of the Live Connector.

live_connect_config: Path | Sequence[Path] | dict[str, Any] | None

Path or Dict to initialize the live connector.

max_error_count: int

Maximum error count when connections in live connector are aborted.

_init_live_connector(files: Path | Sequence[Path] | dict[str, Any] | None = None) None[source]

Initialize the live connector object. Make sure to call _names_from_state before this or to otherwise initialize the names array.

Parameters:

files – Path or Dict to initialize the connection directly from JSON configuration files or a config dictionary.

step(action: np.ndarray) StepResult[source]

Perform one time step and return its results. This is called for every event or for every time step during the simulation/optimization run. It should utilize the actions as supplied by the agent to determine the new state of the environment. The method must return a five-tuple of observations, rewards, terminated, truncated, info.

This also updates self.state and self.state_log to store current state information.

Note

This function always returns 0 reward. Therefore, it must be extended if it is to be used with reinforcement learning agents. If you need to manipulate actions (discretization, policy shaping, …) do this before calling this function. If you need to manipulate observations and rewards, do this after calling this function.

Parameters:

action – Actions to perform in the environment.

Returns:

The return value represents the state of the environment after the step was performed.

  • observations: A numpy array with new observation values as defined by the observation space. Observations is a np.array() (numpy array) with floating point or integer values.

  • reward: The value of the reward function. This is just one floating point value.

  • terminated: Boolean value specifying whether an episode has been completed. If this is set to true, the reset function will automatically be called by the agent or by eta_i.

  • truncated: Boolean, whether the truncation condition outside the scope is satisfied. Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the user needs to call the reset function.

  • info: Provide some additional info about the state of the environment. The contents of this may be used for logging purposes in the future but typically do not currently serve a purpose.

reset(*, seed: int | None = None, options: dict[str, Any] | None = None) tuple[ObservationType, dict[str, Any]][source]

Resets the environment to an initial internal state, returning an initial observation and info.

This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the seed parameter otherwise if the environment already has a random number generator and reset() is called with seed=None, the RNG is not reset. When using the environment in conjunction with stable_baselines3, the vectorized environment will take care of seeding your custom environment automatically.

For Custom environments, the first line of reset() should be super().reset(seed=seed) which implements the seeding correctly.

Note

Don’t forget to store and reset the episode_timer.

Parameters:
  • seed – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and seed=None (the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG and seed=None is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. (default: None)

  • options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)

Returns:

Tuple of observation and info. The observation of the initial state will be an element of observation_space (typically a numpy array) and is analogous to the observation returned by step(). Info is a dictionary containing auxiliary information complementing observation. It should be analogous to the info returned by step().

close() None[source]

Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.

Default behavior for the Live_Connector environment is to do nothing.

_actions_to_state(actions: ndarray) None

Gather actions and store them in self.state.

Parameters:

actions – Actions taken by the agent.

_actions_valid(action: ndarray) None

Check whether the actions are within the specified action space.

Parameters:

action – Actions taken by the agent.

Raise:

RuntimeError, when the actions are not inside of the action space.

_create_new_state(additional_state: dict[str, float] | None) None

Take some initial values and create a new environment state object, stored in self.state.

Parameters:

additional_state – Values to initialize the state.

_done() bool

Check if the episode is over or not using the number of steps (n_steps) and the total number of steps in an episode (n_episode_steps).

Returns:

boolean showing, whether the episode is done.

_is_protocol = False
_np_random: np.random.Generator | None = None
_observations() ndarray

Determine the observations list from environment state. This uses state_config to determine all observations.

Returns:

Observations for the agent as determined by state_config.

_reduce_state_log() list[dict[str, float]]

Removes unwanted parameters from state_log before storing in state_log_longtime

Returns:

The return value is a list of dictionaries, where the parameters that should not be stored were removed

_reset_state() None

Store episode statistics and reset episode counters.

abstract property description: str

Long description of the environment

export_state_log(path: Path, names: Sequence[str] | None = None, *, sep: str = ';', decimal: str = '.') None

Extension of csv_export to include timeseries on the data

Parameters:
  • names – Field names used when data is a Matrix without column names.

  • sep – Separator to use between the fields.

  • decimal – Sign to use for decimal points.

classmethod get_info() tuple[str, str]

Get info about environment.

Returns:

Tuple of version and description.

get_scenario_state() dict[str, Any]

Get scenario data for the current time step of the environment, as specified in state_config. This assumes that scenario data in self.ts_current is available and scaled correctly.

Returns:

Scenario data for current time step.

get_wrapper_attr(name: str) Any

Gets the attribute name from the environment.

import_scenario(*scenario_paths: Mapping[str, Any], prefix_renamed: bool = True) pd.DataFrame

Load data from csv into self.timeseries_data by using scenario_from_csv

Parameters:
  • scenario_paths

    One or more scenario configuration dictionaries (or a list of dicts), which each contain a path for loading data from a scenario file. The dictionary should have the following structure, with <X> denoting the variable value:

    Note

    [{path: <X>, prefix: <X>, interpolation_method: <X>, resample_method: <X>, scale_factors: {col_name: <X>}, rename_cols: {col_name: <X>}, infer_datetime_cols: <X>, time_conversion_str: <X>}]

    • path: Path to the scenario file (relative to scenario_path).

    • prefix: Prefix for all columns in the file, useful if multiple imported files have the same column names.

    • interpolation_method: A pandas interpolation method, required if the frequency of values must be increased in comparison to the files’ data. (e.g.: ‘linear’ or ‘pad’).

    • scale_factors: Scaling factors for specific columns. This can be useful for example, if a column contains data in kilowatt and should be imported in watts. In this case, the scaling factor for the column would be 1000.

    • rename_cols: Mapping of column names from the file to new names for the imported data.

    • infer_datetime_cols: Number of the column which contains datetime data. If this value is not present, the time_conversion_str variable will be used to determine the datetime format.

    • time_conversion_str: Time conversion string, determining the datetime format used in the imported file (default: %Y-%m-%d %H:%M).

  • prefix_renamed – Determine whether the prefix is also applied to renamed columns.

Returns:

Data Frame of the imported and formatted scenario data.

property np_random: numpy.random.Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:

Instances of np.random.Generator

abstract render() None

Render the environment

The set of supported modes varies per environment. Some environments do not support rendering at all. By convention in Farama gymnasium, if mode is:

  • human: render to the current display or terminal and return nothing. Usually for human consumption.

  • rgb_array: Return a numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.

  • ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

render_mode: str | None = None

Render mode for rendering the environment

property unwrapped: Env[ObsType, ActType]

Returns the base non-wrapped environment.

Returns:

Env: The base non-wrapped gymnasium.Env instance

abstract property version: str

Version of the environment

verbose: int

Verbosity level used for logging.

config_run: ConfigOptRun

Information about the optimization run and information about the paths. For example, it defines path_results and path_scenarios.

path_results: pathlib.Path

Path for storing results.

path_scenarios: pathlib.Path | None

Path for the scenario data.

path_env: pathlib.Path

Path of the environment file.

callback: Callable | None

Callback can be used for logging and plotting.

state_modification_callback: Callable | None

Callback can be used for modifying the state at each time step.

env_id: int

ID of the environment (useful for vectorized environments).

run_name: str

Name of the current optimization run.

n_episodes: int

Number of completed episodes.

n_steps: int

Current step of the model (number of completed steps) in the current episode.

n_steps_longtime: int

Current step of the model (total over all episodes).

episode_duration: float

Duration of one episode in seconds.

sampling_time: float

Sampling time (interval between optimization time steps) in seconds.

n_episode_steps: int

Number of time steps (of width sampling_time) in each episode.

scenario_duration: float

Duration of the scenario for each episode (for total time imported from csv).

scenario_time_begin: datetime

Beginning time of the scenario.

scenario_time_end: datetime

Ending time of the scenario (should be in the format %Y-%m-%d %H:%M).

timeseries: pd.DataFrame

The time series DataFrame contains all time series scenario data. It can be filled by the import_scenario method.

ts_current: pd.DataFrame

Data frame containing the currently valid range of time series data.

state_config: StateConfig | None

Configuration to describe what the environment state looks like.

episode_timer: float

Episode timer (stores the start time of the episode).

state: dict[str, float]

Current state of the environment.

additional_state: dict[str, float] | None

Additional state information to append to the state during stepping and reset

state_log: list[dict[str, float]]

Log of the environment state.

state_log_longtime: list[list[dict[str, float]]]

Log of the environment state over multiple episodes.

data: dict[str, Any]

Some specific current environment settings / other data, apart from state.

data_log: list[dict[str, Any]]

Log of specific environment settings / other data, apart from state for the episode.

data_log_longtime: list[list[dict[str, Any]]]

Log of specific environment settings / other data, apart from state, over multiple episodes.

sim_steps_per_sample: int

Number of simulation steps to be taken for each sample. This must be a divisor of ‘sampling_time’.

action_space: spaces.Space[ActType]
observation_space: spaces.Space[ObsType]

Julia Environment

The JuliaEnv is an environment that supports the connection to a julia file. Make sure to set the julia_env_file where your julia file is located. In contrast to the other environments, the Julia class, written in Python, must be imported in the setup file for the parameter environment_import. The parameter julia_env_file is located in the settings section of the configuration file. See also Experiment configuration.

class eta_utility.eta_x.envs.JuliaEnv(env_id: int, config_run: ConfigOptRun, verbose: int = 2, callback: Callable | None = None, *, scenario_time_begin: datetime | str, scenario_time_end: datetime | str, episode_duration: TimeStep | str, sampling_time: TimeStep | str, julia_env_file: pathlib.Path | str, render_mode: str | None = None, **kwargs: Any)[source]

Bases: BaseEnv

TODO: UPDATE DOCUMENTATION! Abstract environment definition, providing some basic functionality for concrete environments to use. The class implements and adapts functions from gymnasium.Env. It provides additional functionality as required by the ETA-X framework and should be used as the starting point for new environments.

The initialization of this superclass performs many of the necessary tasks, required to specify a concrete environment. Read the documentation carefully to understand, how new environments can be developed, building on this starting point.

There are some attributes that must be set and some methods that must be implemented to satisfy the interface. This is required to create concrete environments. The required attributes are:

  • version: Version number of the environment.

  • description: Short description string of the environment.

  • action_space: The action space of the environment (see also gymnasium.spaces for options).

  • observation_space: The observation space of the environment (see also gymnasium.spaces for options).

The gymnasium interface requires the following methods for the environment to work correctly within the framework. Consult the documentation of each method for more detail.

  • step()

  • reset()

  • close()

Parameters:
  • env_id – Identification for the environment, useful when creating multiple environments.

  • config_run – Configuration of the optimization run.

  • verbose – Verbosity to use for logging.

  • callback – callback which should be called after each episode.

  • scenario_time_begin – Beginning time of the scenario.

  • scenario_time_end – Ending time of the scenario.

  • episode_duration – Duration of the episode in seconds.

  • sampling_time – Duration of a single time sample / time step in seconds.

  • render_mode – Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.

  • kwargs – Other keyword arguments (for subclasses).

version = '1.0'
description = 'This environment uses a julia file to perform its functions.'
julia_env_path: pathlib.Path

Root Path to the julia file.

__jl: ModuleType

Imported Julia file as a module (written in julia) for further initialization of the environment.

_jlenv

Initialized julia environment (written in julia).

first_update(observations: ndarray) ndarray[source]

Perform the first update and set values in simulation model to the observed values.

Parameters:

observations – Observations of another environment.

Returns:

Full array of observations.

update(observations: ndarray) ndarray[source]

Update the optimization model with observations from another environment.

Parameters:

observations – Observations from another environment

Returns:

Full array of current observations

step(action: np.ndarray) StepResult[source]

Perform one time step and return its results. This is called for every event or for every time step during the simulation/optimization run. It should utilize the actions as supplied by the agent to determine the new state of the environment. The method must return a five-tuple of observations, rewards, terminated, truncated, info.

Note

Do not forget to increment n_steps and n_steps_longtime.

Parameters:

action – Actions taken by the agent.

Returns:

The return value represents the state of the environment after the step was performed.

  • observations: A numpy array with new observation values as defined by the observation space. Observations is a np.array() (numpy array) with floating point or integer values.

  • reward: The value of the reward function. This is just one floating point value.

  • terminated: Boolean value specifying whether an episode has been completed. If this is set to true, the reset function will automatically be called by the agent or by eta_i.

  • truncated: Boolean, whether the truncation condition outside the scope is satisfied. Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the user needs to call the reset function.

  • info: Provide some additional info about the state of the environment. The contents of this may be used for logging purposes in the future but typically do not currently serve a purpose.

_reduce_state_log() list[dict[str, float]][source]

Removes unwanted parameters from state_log before storing in state_log_longtime

Returns:

The return value is a list of dictionaries, where the parameters that should not be stored were removed

reset(*, seed: int | None = None, options: dict[str, Any] | None = None) tuple[np.ndarray, dict[str, Any]][source]

Resets the environment to an initial internal state, returning an initial observation and info.

This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the seed parameter otherwise if the environment already has a random number generator and reset() is called with seed=None, the RNG is not reset. When using the environment in conjunction with stable_baselines3, the vectorized environment will take care of seeding your custom environment automatically.

For Custom environments, the first line of reset() should be super().reset(seed=seed) which implements the seeding correctly.

Note

Don’t forget to store and reset the episode_timer.

Parameters:
  • seed – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and seed=None (the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG and seed=None is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. (default: None)

  • options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)

Returns:

Tuple of observation and info. The observation of the initial state will be an element of observation_space (typically a numpy array) and is analogous to the observation returned by step(). Info is a dictionary containing auxiliary information complementing observation. It should be analogous to the info returned by step().

close() None[source]

Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.

render(**kwargs: Any) None[source]

Render the environment

The set of supported modes varies per environment. Some environments do not support rendering at all. By convention in Farama gymnasium, if mode is:

  • human: render to the current display or terminal and return nothing. Usually for human consumption.

  • rgb_array: Return a numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.

  • ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

_actions_to_state(actions: ndarray) None

Gather actions and store them in self.state.

Parameters:

actions – Actions taken by the agent.

_actions_valid(action: ndarray) None

Check whether the actions are within the specified action space.

Parameters:

action – Actions taken by the agent.

Raise:

RuntimeError, when the actions are not inside of the action space.

_create_new_state(additional_state: dict[str, float] | None) None

Take some initial values and create a new environment state object, stored in self.state.

Parameters:

additional_state – Values to initialize the state.

_done() bool

Check if the episode is over or not using the number of steps (n_steps) and the total number of steps in an episode (n_episode_steps).

Returns:

boolean showing, whether the episode is done.

_is_protocol = False
_np_random: np.random.Generator | None = None
_observations() ndarray

Determine the observations list from environment state. This uses state_config to determine all observations.

Returns:

Observations for the agent as determined by state_config.

_reset_state() None

Store episode statistics and reset episode counters.

export_state_log(path: Path, names: Sequence[str] | None = None, *, sep: str = ';', decimal: str = '.') None

Extension of csv_export to include timeseries on the data

Parameters:
  • names – Field names used when data is a Matrix without column names.

  • sep – Separator to use between the fields.

  • decimal – Sign to use for decimal points.

classmethod get_info() tuple[str, str]

Get info about environment.

Returns:

Tuple of version and description.

get_scenario_state() dict[str, Any]

Get scenario data for the current time step of the environment, as specified in state_config. This assumes that scenario data in self.ts_current is available and scaled correctly.

Returns:

Scenario data for current time step.

get_wrapper_attr(name: str) Any

Gets the attribute name from the environment.

import_scenario(*scenario_paths: Mapping[str, Any], prefix_renamed: bool = True) pd.DataFrame

Load data from csv into self.timeseries_data by using scenario_from_csv

Parameters:
  • scenario_paths

    One or more scenario configuration dictionaries (or a list of dicts), which each contain a path for loading data from a scenario file. The dictionary should have the following structure, with <X> denoting the variable value:

    Note

    [{path: <X>, prefix: <X>, interpolation_method: <X>, resample_method: <X>, scale_factors: {col_name: <X>}, rename_cols: {col_name: <X>}, infer_datetime_cols: <X>, time_conversion_str: <X>}]

    • path: Path to the scenario file (relative to scenario_path).

    • prefix: Prefix for all columns in the file, useful if multiple imported files have the same column names.

    • interpolation_method: A pandas interpolation method, required if the frequency of values must be increased in comparison to the files’ data. (e.g.: ‘linear’ or ‘pad’).

    • scale_factors: Scaling factors for specific columns. This can be useful for example, if a column contains data in kilowatt and should be imported in watts. In this case, the scaling factor for the column would be 1000.

    • rename_cols: Mapping of column names from the file to new names for the imported data.

    • infer_datetime_cols: Number of the column which contains datetime data. If this value is not present, the time_conversion_str variable will be used to determine the datetime format.

    • time_conversion_str: Time conversion string, determining the datetime format used in the imported file (default: %Y-%m-%d %H:%M).

  • prefix_renamed – Determine whether the prefix is also applied to renamed columns.

Returns:

Data Frame of the imported and formatted scenario data.

property np_random: numpy.random.Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:

Instances of np.random.Generator

render_mode: str | None = None

Render mode for rendering the environment

property unwrapped: Env[ObsType, ActType]

Returns the base non-wrapped environment.

Returns:

Env: The base non-wrapped gymnasium.Env instance

verbose: int

Verbosity level used for logging.

config_run: ConfigOptRun

Information about the optimization run and information about the paths. For example, it defines path_results and path_scenarios.

path_results: pathlib.Path

Path for storing results.

path_scenarios: pathlib.Path | None

Path for the scenario data.

path_env: pathlib.Path

Path of the environment file.

callback: Callable | None

Callback can be used for logging and plotting.

state_modification_callback: Callable | None

Callback can be used for modifying the state at each time step.

env_id: int

ID of the environment (useful for vectorized environments).

run_name: str

Name of the current optimization run.

n_episodes: int

Number of completed episodes.

n_steps: int

Current step of the model (number of completed steps) in the current episode.

n_steps_longtime: int

Current step of the model (total over all episodes).

episode_duration: float

Duration of one episode in seconds.

sampling_time: float

Sampling time (interval between optimization time steps) in seconds.

n_episode_steps: int

Number of time steps (of width sampling_time) in each episode.

scenario_duration: float

Duration of the scenario for each episode (for total time imported from csv).

scenario_time_begin: datetime

Beginning time of the scenario.

scenario_time_end: datetime

Ending time of the scenario (should be in the format %Y-%m-%d %H:%M).

timeseries: pd.DataFrame

The time series DataFrame contains all time series scenario data. It can be filled by the import_scenario method.

ts_current: pd.DataFrame

Data frame containing the currently valid range of time series data.

state_config: StateConfig | None

Configuration to describe what the environment state looks like.

episode_timer: float

Episode timer (stores the start time of the episode).

state: dict[str, float]

Current state of the environment.

additional_state: dict[str, float] | None

Additional state information to append to the state during stepping and reset

state_log: list[dict[str, float]]

Log of the environment state.

state_log_longtime: list[list[dict[str, float]]]

Log of the environment state over multiple episodes.

data: dict[str, Any]

Some specific current environment settings / other data, apart from state.

data_log: list[dict[str, Any]]

Log of specific environment settings / other data, apart from state for the episode.

data_log_longtime: list[list[dict[str, Any]]]

Log of specific environment settings / other data, apart from state, over multiple episodes.

sim_steps_per_sample: int

Number of simulation steps to be taken for each sample. This must be a divisor of ‘sampling_time’.

action_space: spaces.Space[ActType]
observation_space: spaces.Space[ObsType]