Environments
eta_utility environments are based on the interfaces offered by stable_baselines3 which are in turn based on the Farama gymnasium environments. The eta_x environments are provided as abstract classes which must be subclassed to create useful implementations. For the specific use cases they are intended for, these base classes make the creation of new environments much easier.
Custom environments should follow the interface for custom environments discussed in the stable_baselines3 documentation. The following describes the functions available to simplify implementation of specific functionality in custom environments. You can look at the Usage examples for some inspiration what custom environments can look like.
The custom environments created with the utilities described here can be used directly with stable_baselines3 or
gymnasium. However, using the eta_utility.eta_x::ETAx
class is recommended (see Introduction).
When using the ETAx class for your optimization runs, the parameters required for environment instantiation must
be configured in the environment_specific section of the configuration. If interaction between environments is also
configured, additional parameters can be set in the configuration file. To configure the interaction environment, use
the section interaction_env_specific. If that section is not present, the parameters from the environment_specific
section will be used for both environments.
Environment State Configuration
The most important concept to understand when working with the environment utilities provided by eta_utility is
is the handling and configuration of the environment state. The state is represented by
eta_utility.eta_x.envs::StateVar
objects which each correspond to one variable of the environment. All
StateVar objects of an environment are combined into the StateConfig object. From the StateConfig object we can
determine most other aspects of the environment, such as for example the observation space and action space. The
gymnasium documentation provides more information about Spaces.
Each state variable is represented by a StateVar object:
- class eta_utility.eta_x.envs.StateVar(name: str, *, is_agent_action=False, is_agent_observation=False, add_to_state_log=True, ext_id: str | int | None = None, is_ext_input=False, is_ext_output=False, ext_scale_add=0, ext_scale_mult=1, interact_id: int | None = None, from_interact=False, interact_scale_add=0, interact_scale_mult=1, scenario_id: str | None = None, from_scenario=False, scenario_scale_add=0, scenario_scale_mult=1, low_value=nan, high_value=nan, abort_condition_min=None, abort_condition_max=None, index=0)[source]
A variable in the state of an environment.
For example, the variable “tank_temperature” might be part of the environment’s state. Let’s assume it represents the temperature inside the tank of a cleaning machine. This variable could be read from an external source. In this case it must have
is_ext_output = True
and the name of the external variable to read from must be specified:ext_id = "T_Tank"
. If this value should also be passed to the agent as an observation, setis_agent_observation = True
. For observations and actions, you also need to set the low and high values, which determine the size of the observation and action spaces in this case something likelow_value = 20
andhigh_value = 80
(if we are talking about water temperature measured in Celsius) might make sense.If you want the environment to safely abort the optimization when certain values are exceeded, set the abort conditions to sensible values such as
abort_condition_min = 0
andabort_condition_max = 100
. This can be especially useful for example if you have simulation models which do not support certain values (for example, in this case they might not be able to handle water temperatures higher than 100 °C):v1 = StateVar( "tank_temperature", ext_id = "T_Tank", is_ext_output = True, is_agent_observation = True, low_value = 20, high_value = 80, abort_condition_min = 0, abort_condition_max = 100, )
As another example, you could set up an agent action named
name = "set_heater"
which the environment uses to set the state of the tank heater. In this case, the state variable should be configured withis_agent_action = True
and you might want to pass this on to a simulation model or an actual machine by settingis_ext_input = True
:v2 = StateVar( "set_heater", ext_id = "u_tank", is_ext_input = True, is_agent_action = True, )
Finally, let’s create a third variable which is read from a scenario file and converted from kilowatts to watts (multiplied by 1000). Additionally, this variable needs to be offset by a value of -10 due to measurement errors:
v3 = StateVar( "outside_temperature", scenario_id = "T_ouside", scenario_scale_add = -10, scenario_scale_mult = 1000, is_agent_observation = True, low_value = 0, high_value = 40, )
- name: str
Name of the state variable (This must always be specified).
- is_agent_action: bool
Should the agent specify actions for this variable? (default: False).
- is_agent_observation: bool
Should the agent be allowed to observe the value of this variable? (default: False).
- add_to_state_log: bool
Should the state log of this episode be added to state_log_longtime? (default: True).
- ext_id: str | int | None
Name or identifier (order) of the variable in the external interaction model (e.g.: environment or FMU) (default: StateVar.name if (is_ext_input or is_ext_output) else None).
- is_ext_input: bool
Should this variable be passed to the external model as an input? (default: False).
- is_ext_output: bool
Should this variable be parsed from the external model output? (default: False).
- ext_scale_add: float
Value to add to the output from an external model (default: 0).
- ext_scale_mult: float
Value to multiply to the output from an external model (default: 1).
- interact_id: int | None
Name or identifier (order) of the variable in an interaction environment (default: None).
- from_interact: bool
Should this variable be read from the interaction environment? (default: False).
- interact_scale_add: float
Value to add to the value read from an interaction (default: 0).
- interact_scale_mult: float
Value to multiply to the value read from an interaction (default: 1).
- scenario_id: str | None
Name of the scenario variable, this value should be read from (default: None).
- from_scenario: bool
Should this variable be read from imported timeseries date? (default: False).
- scenario_scale_add: float
Value to add to the value read from a scenario file (default: 0).
- scenario_scale_mult: float
Value to multiply to the value read from a scenario file (default: 1).
- low_value: float
Lowest possible value of the state variable (default: np.nan).
- high_value: float
Highest possible value of the state variable (default: np.nan).
- abort_condition_min: float | None
If the value of the variable dips below this, the episode should be aborted (default: None).
- abort_condition_max: float | None
If the value of the variable rises above this, the episode should be aborted (default: None).
- index: int
Determine the index, where to look (useful for mathematical optimization, where multiple time steps could be returned). In this case, the index values might be different for actions and observations.
All state variables are combined into the StateConfig object:
- class eta_utility.eta_x.envs.StateConfig(*state_vars: StateVar)[source]
The configuration for the action and observation spaces. The values are used to control which variables are part of the action space and observation space. Additionally, the parameters can specify abort conditions and the handling of values from interaction environments or from simulation. Therefore, the StateConfig is very important for the functionality of ETA X.
Using the examples above, we could create the StateConfig object by passing our three state variables to the constructor:
state_config = StateConfig(v1, v2, v3)
If you are creating an environment, assign the StateConfig object to
self.state_config
. This will sometimes even be sufficient to create a fully functional environment.- vars: dict[str, StateVar]
Mapping of the variables names to their StateVar instance with all associated information.
- ext_inputs: list[str]
List of variables that should be provided to an external source (such as an FMU).
- ext_outputs: list[str]
List of variables that can be received from an external source (such as an FMU).
- rev_ext_ids: dict[str | int, str]
Reverse mapping of external IDs to their corresponding variable names.
- ext_scale: dict[str, dict[str, float]]
Dictionary of scaling values for external input values (for example from simulations). Contains fields ‘add’ and ‘multiply’
- interact_scale: dict[str, dict[str, float]]
Dictionary of scaling values for interact values. Contains fields ‘add’ and ‘multiply’.
- scenario_scale: dict[str, dict[str, float]]
Dictionary of scaling values for scenario values. Contains fields ‘add’ and ‘multiply’.
- append_state(var: StateVar) None [source]
Append a state variable to the state configuration.
- Parameters:
var – StateVar instance to append to the configuration.
- store_file(file: Path) None [source]
Save the StateConfig to a comma separated file.
- Parameters:
file – Path to the file.
- within_abort_conditions(state: Mapping[str, float]) bool [source]
Check whether the given state is within the abort conditions specified by the StateConfig instance.
- Parameters:
state – The state array to check for conformance.
- Returns:
Result of the check (False if the state does not conform to the required conditions).
- continuous_action_space() Box [source]
Generate an action space according to the format required by the OpenAI specification.
- Returns:
Action space.
The state config object and its attributes (such as the observations) are used by the environments to determine which values to update during steps, which values to read from scenario files and which values to pass to the agent as actions.
Base Environment
- class eta_utility.eta_x.envs.BaseEnv(env_id: int, config_run: ConfigOptRun, verbose: int = 2, callback: Callable | None = None, state_modification_callback: Callable | None = None, *, scenario_time_begin: datetime | str, scenario_time_end: datetime | str, episode_duration: TimeStep | str, sampling_time: TimeStep | str, sim_steps_per_sample: int | str = 1, render_mode: str | None = None, **kwargs: Any)[source]
-
Abstract environment definition, providing some basic functionality for concrete environments to use. The class implements and adapts functions from gymnasium.Env. It provides additional functionality as required by the ETA-X framework and should be used as the starting point for new environments.
The initialization of this superclass performs many of the necessary tasks, required to specify a concrete environment. Read the documentation carefully to understand, how new environments can be developed, building on this starting point.
There are some attributes that must be set and some methods that must be implemented to satisfy the interface. This is required to create concrete environments. The required attributes are:
version: Version number of the environment.
description: Short description string of the environment.
action_space: The action space of the environment (see also gymnasium.spaces for options).
observation_space: The observation space of the environment (see also gymnasium.spaces for options).
The gymnasium interface requires the following methods for the environment to work correctly within the framework. Consult the documentation of each method for more detail.
step()
reset()
close()
- Parameters:
env_id – Identification for the environment, useful when creating multiple environments.
config_run – Configuration of the optimization run.
verbose – Verbosity to use for logging.
callback – callback that should be called after each episode.
state_modification_callback – callback that should be called after state setup, before logging the state.
scenario_time_begin – Beginning time of the scenario.
scenario_time_end – Ending time of the scenario.
episode_duration – Duration of the episode in seconds.
sampling_time – Duration of a single time sample / time step in seconds.
render_mode – Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.
kwargs – Other keyword arguments (for subclasses).
- abstract property version: str
Version of the environment
- abstract property description: str
Long description of the environment
- verbose: int
Verbosity level used for logging.
- config_run: ConfigOptRun
Information about the optimization run and information about the paths. For example, it defines path_results and path_scenarios.
- path_results: pathlib.Path
Path for storing results.
- path_scenarios: pathlib.Path | None
Path for the scenario data.
- path_env: pathlib.Path
Path of the environment file.
- callback: Callable | None
Callback can be used for logging and plotting.
- state_modification_callback: Callable | None
Callback can be used for modifying the state at each time step.
- env_id: int
ID of the environment (useful for vectorized environments).
- run_name: str
Name of the current optimization run.
- n_episodes: int
Number of completed episodes.
- n_steps: int
Current step of the model (number of completed steps) in the current episode.
- n_steps_longtime: int
Current step of the model (total over all episodes).
- episode_duration: float
Duration of one episode in seconds.
- sampling_time: float
Sampling time (interval between optimization time steps) in seconds.
- n_episode_steps: int
Number of time steps (of width sampling_time) in each episode.
- scenario_duration: float
Duration of the scenario for each episode (for total time imported from csv).
- scenario_time_begin: datetime
Beginning time of the scenario.
- scenario_time_end: datetime
Ending time of the scenario (should be in the format %Y-%m-%d %H:%M).
- timeseries: pd.DataFrame
The time series DataFrame contains all time series scenario data. It can be filled by the import_scenario method.
- ts_current: pd.DataFrame
Data frame containing the currently valid range of time series data.
- state_config: StateConfig | None
Configuration to describe what the environment state looks like.
- episode_timer: float
Episode timer (stores the start time of the episode).
- additional_state: dict[str, float] | None
Additional state information to append to the state during stepping and reset
- state_log_longtime: list[list[dict[str, float]]]
Log of the environment state over multiple episodes.
- data_log: list[dict[str, Any]]
Log of specific environment settings / other data, apart from state for the episode.
- data_log_longtime: list[list[dict[str, Any]]]
Log of specific environment settings / other data, apart from state, over multiple episodes.
- sim_steps_per_sample: int
Number of simulation steps to be taken for each sample. This must be a divisor of ‘sampling_time’.
- import_scenario(*scenario_paths: Mapping[str, Any], prefix_renamed: bool = True) pd.DataFrame [source]
Load data from csv into self.timeseries_data by using scenario_from_csv
- Parameters:
scenario_paths –
One or more scenario configuration dictionaries (or a list of dicts), which each contain a path for loading data from a scenario file. The dictionary should have the following structure, with <X> denoting the variable value:
Note
[{path: <X>, prefix: <X>, interpolation_method: <X>, resample_method: <X>, scale_factors: {col_name: <X>}, rename_cols: {col_name: <X>}, infer_datetime_cols: <X>, time_conversion_str: <X>}]
path: Path to the scenario file (relative to scenario_path).
prefix: Prefix for all columns in the file, useful if multiple imported files have the same column names.
interpolation_method: A pandas interpolation method, required if the frequency of values must be increased in comparison to the files’ data. (e.g.: ‘linear’ or ‘pad’).
scale_factors: Scaling factors for specific columns. This can be useful for example, if a column contains data in kilowatt and should be imported in watts. In this case, the scaling factor for the column would be 1000.
rename_cols: Mapping of column names from the file to new names for the imported data.
infer_datetime_cols: Number of the column which contains datetime data. If this value is not present, the time_conversion_str variable will be used to determine the datetime format.
time_conversion_str: Time conversion string, determining the datetime format used in the imported file (default: %Y-%m-%d %H:%M).
prefix_renamed – Determine whether the prefix is also applied to renamed columns.
- Returns:
Data Frame of the imported and formatted scenario data.
- get_scenario_state() dict[str, Any] [source]
Get scenario data for the current time step of the environment, as specified in state_config. This assumes that scenario data in self.ts_current is available and scaled correctly.
- Returns:
Scenario data for current time step.
- abstract step(action: np.ndarray) StepResult [source]
Perform one time step and return its results. This is called for every event or for every time step during the simulation/optimization run. It should utilize the actions as supplied by the agent to determine the new state of the environment. The method must return a five-tuple of observations, rewards, terminated, truncated, info.
Note
Do not forget to increment n_steps and n_steps_longtime.
- Parameters:
action – Actions taken by the agent.
- Returns:
The return value represents the state of the environment after the step was performed.
observations: A numpy array with new observation values as defined by the observation space. Observations is a np.array() (numpy array) with floating point or integer values.
reward: The value of the reward function. This is just one floating point value.
terminated: Boolean value specifying whether an episode has been completed. If this is set to true, the reset function will automatically be called by the agent or by eta_i.
truncated: Boolean, whether the truncation condition outside the scope is satisfied. Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the user needs to call the reset function.
info: Provide some additional info about the state of the environment. The contents of this may be used for logging purposes in the future but typically do not currently serve a purpose.
- _actions_valid(action: ndarray) None [source]
Check whether the actions are within the specified action space.
- Parameters:
action – Actions taken by the agent.
- Raise:
RuntimeError, when the actions are not inside of the action space.
- _create_new_state(additional_state: dict[str, float] | None) None [source]
Take some initial values and create a new environment state object, stored in self.state.
- Parameters:
additional_state – Values to initialize the state.
- _actions_to_state(actions: ndarray) None [source]
Gather actions and store them in self.state.
- Parameters:
actions – Actions taken by the agent.
- _observations() ndarray [source]
Determine the observations list from environment state. This uses state_config to determine all observations.
- Returns:
Observations for the agent as determined by state_config.
- _done() bool [source]
Check if the episode is over or not using the number of steps (n_steps) and the total number of steps in an episode (n_episode_steps).
- Returns:
boolean showing, whether the episode is done.
- reset(*, seed: int | None = None, options: dict[str, Any] | None = None) tuple[ObservationType, dict[str, Any]] [source]
Resets the environment to an initial internal state, returning an initial observation and info.
This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the
seed
parameter otherwise if the environment already has a random number generator andreset()
is called withseed=None
, the RNG is not reset. When using the environment in conjunction with stable_baselines3, the vectorized environment will take care of seeding your custom environment automatically.For Custom environments, the first line of
reset()
should besuper().reset(seed=seed)
which implements the seeding correctly.Note
Don’t forget to store and reset the episode_timer by calling self._reset_state() if you overwrite this function.
- Parameters:
seed – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and
seed=None
(the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG andseed=None
is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. (default: None)options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)
- Returns:
Tuple of observation and info. The observation of the initial state will be an element of
observation_space
(typically a numpy array) and is analogous to the observation returned bystep()
. Info is a dictionary containing auxiliary information complementingobservation
. It should be analogous to theinfo
returned bystep()
.
- _reduce_state_log() list[dict[str, float]] [source]
Removes unwanted parameters from state_log before storing in state_log_longtime
- Returns:
The return value is a list of dictionaries, where the parameters that should not be stored were removed
- _is_protocol = False
- _np_random: np.random.Generator | None = None
- property np_random: numpy.random.Generator
Returns the environment’s internal
_np_random
that if not set will initialise with a random seed.- Returns:
Instances of np.random.Generator
- property unwrapped: Env[ObsType, ActType]
Returns the base non-wrapped environment.
- Returns:
Env: The base non-wrapped
gymnasium.Env
instance
- action_space: spaces.Space[ActType]
- observation_space: spaces.Space[ObsType]
- abstract close() None [source]
Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.
- abstract render() None [source]
Render the environment
The set of supported modes varies per environment. Some environments do not support rendering at all. By convention in Farama gymnasium, if mode is:
human: render to the current display or terminal and return nothing. Usually for human consumption.
rgb_array: Return a numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).
- classmethod get_info() tuple[str, str] [source]
Get info about environment.
- Returns:
Tuple of version and description.
- export_state_log(path: Path, names: Sequence[str] | None = None, *, sep: str = ';', decimal: str = '.') None [source]
Extension of csv_export to include timeseries on the data
- Parameters:
names – Field names used when data is a Matrix without column names.
sep – Separator to use between the fields.
decimal – Sign to use for decimal points.
Model Predictive Control (MPC) Environment
The BaseEnvMPC is a class for the optimization of mathematical MPC models.
- class eta_utility.eta_x.envs.BaseEnvMPC(env_id: int, config_run: ConfigOptRun, verbose: int = 2, callback: Callable | None = None, *, scenario_time_begin: datetime | str, scenario_time_end: datetime | str, episode_duration: TimeStep | str, sampling_time: TimeStep | str, model_parameters: Mapping[str, Any], prediction_scope: TimeStep | str | None = None, render_mode: str | None = None, **kwargs: Any)[source]
-
Base class for mathematical MPC models. This class can be used in conjunction with the MathSolver agent. You need to implement the _model method in a subclass and return a pyomo.AbstractModel from it.
- Parameters:
env_id – Identification for the environment, useful when creating multiple environments.
config_run – Configuration of the optimization run.
verbose – Verbosity to use for logging.
callback – callback which should be called after each episode.
scenario_time_begin – Beginning time of the scenario.
scenario_time_end – Ending time of the scenario.
episode_duration – Duration of the episode in seconds.
sampling_time – Duration of a single time sample / time step in seconds.
model_parameters – Parameters for the mathematical model.
prediction_scope – Duration of the prediction (usually a subsample of the episode duration).
render_mode – Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.
kwargs – Other keyword arguments (for subclasses).
- prediction_scope: float
Total duration of one prediction/optimization run when used with the MPC agent. This is automatically set to the value of episode_duration if it is not supplied separately.
- n_prediction_steps: int
Number of steps in the prediction (prediction_scope/sampling_time).
- scenario_duration: float
Duration of the scenario for each episode (for total time imported from csv).
- model_parameters
Configuration for the MILP model parameters.
- _concrete_model: pyo.ConcreteModel | None
Concrete pyomo model as initialized by _model.
- time_var: str | None
Name of the “time” variable/set in the model (i.e. “T”). This is if the pyomo sets must be re-indexed when updating the model between time steps. If this is None, it is assumed that no reindexing of the timeseries data is required during updates - this is the default.
- nonindex_update_append_string: str | None
Updating indexed model parameters can be achieved either by updating only the first value of the actual parameter itself or by having a separate handover parameter that is used for specifying only the first value. The separate handover parameter can be denoted with an appended string. For example, if the actual parameter is x.ON then the handover parameter could be x.ON_first. To use x.ON_first for updates, set the nonindex_update_append_string to “_first”. If the attribute is set to None, the first value of the actual parameter (x.ON) would be updated instead.
- _use_model_time_increments: bool
Some models may not use the actual time increment (sampling_time). Instead, they would translate into model time increments (each sampling time increment equals a single model time step). This means that indices of the model components simply count 1,2,3,… instead of 0, sampling_time, 2*sampling_time, … Set this to true, if model time increments (1, 2, 3, …) are used. Otherwise, sampling_time will be used as the time increment. Note: This is only relevant for the first model time increment, later increments may differ.
- property model: tuple[ConcreteModel, list]
The model property is a tuple of the concrete model and the order of the action space. This is used such that the MPC algorithm can re-sort the action output. This sorting cannot be conveyed differently through pyomo.
- Returns:
Tuple of the concrete model and the order of the action space.
- abstract _model() AbstractModel [source]
Create the abstract pyomo model. This is where the pyomo model description should be placed.
- Returns:
Abstract pyomo model.
- step(action: np.ndarray) StepResult [source]
Perform one time step and return its results. This is called for every event or for every time step during the simulation/optimization run. It should utilize the actions as supplied by the agent to determine the new state of the environment. The method must return a five-tuple of observations, rewards, terminated, truncated, info.
This also updates self.state and self.state_log to store current state information.
Note
This function always returns 0 reward. Therefore, it must be extended if it is to be used with reinforcement learning agents. If you need to manipulate actions (discretization, policy shaping, …) do this before calling this function. If you need to manipulate observations and rewards, do this after calling this function.
- Parameters:
action (np.ndarray) – Actions to perform in the environment.
- Returns:
The return value represents the state of the environment after the step was performed.
observations: A numpy array with new observation values as defined by the observation space. Observations is a np.array() (numpy array) with floating point or integer values.
reward: The value of the reward function. This is just one floating point value.
terminated: Boolean value specifying whether an episode has been completed. If this is set to true, the reset function will automatically be called by the agent or by eta_i.
truncated: Boolean, whether the truncation condition outside the scope is satisfied.
truncated: Boolean, whether the truncation condition outside the scope is satisfied. Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the user needs to call the reset function.
info: Provide some additional info about the state of the environment. The contents of this may be used for logging purposes in the future but typically do not currently serve a purpose.
- update(observations: Sequence[Sequence[float | int]] | None = None) np.ndarray [source]
Update the optimization model with observations from another environment.
- Parameters:
observations – Observations from another environment.
- Returns:
Full array of current observations.
- solve_failed(model: pyo.ConcreteModel, result: SolverResults) None [source]
This method will try to render the result in case the model could not be solved. It should automatically be called by the agent.
- Parameters:
model – Current model.
result – Result of the last solution attempt.
- reset(*, seed: int | None = None, options: dict[str, Any] | None = None) tuple[np.ndarray, dict[str, Any]] [source]
Resets the environment to an initial internal state, returning an initial observation and info.
This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the
seed
parameter otherwise if the environment already has a random number generator andreset()
is called withseed=None
, the RNG is not reset. When using the environment in conjunction with stable_baselines3, the vectorized environment will take care of seeding your custom environment automatically.For Custom environments, the first line of
reset()
should besuper().reset(seed=seed)
which implements the seeding correctly.Note
Don’t forget to store and reset the episode_timer.
- Parameters:
seed – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and
seed=None
(the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG andseed=None
is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. (default: None)options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)
- Returns:
Tuple of observation and info. The observation of the initial state will be an element of
observation_space
(typically a numpy array) and is analogous to the observation returned bystep()
. Info is a dictionary containing auxiliary information complementingobservation
. It should be analogous to theinfo
returned bystep()
.
- close() None [source]
Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.
Default behavior for the MPC environment is to do nothing.
- pyo_component_params(component_name: None | str, ts: pd.DataFrame | pd.Series | dict[str, dict] | Sequence | None = None, index: pd.Index | Sequence | pyo.Set | None = None) PyoParams [source]
Retrieve parameters for the named component and convert the parameters into the pyomo dict-format. If required, timeseries can be added to the parameters and timeseries may be reindexed. The pyo_convert_timeseries function is used for timeseries handling. See also pyo_convert_timeseries
- Parameters:
component_name – Name of the component.
ts – Timeseries for the component.
index – New index for timeseries data. If this is supplied, all timeseries will be copied and reindexed.
- Returns:
Pyomo parameter dictionary.
- static pyo_convert_timeseries(ts: pd.DataFrame | pd.Series | dict[str | None, dict[str, Any] | Any] | Sequence, index: pd.Index | Sequence | pyo.Set | None = None, component_name: str | None = None, *, _add_wrapping_none: bool = True) PyoParams [source]
Convert a time series data into a pyomo format. Data will be reindexed if a new index is provided.
- Parameters:
ts – Timeseries to convert.
index – New index for timeseries data. If this is supplied, all timeseries will be copied and reindexed.
component_name – Name of a specific component that the timeseries is used for. This limits which timeseries are returned.
_add_wrapping_none – Add a “None” indexed dictionary as the top level.
- Returns:
Pyomo parameter dictionary.
- _actions_to_state(actions: ndarray) None
Gather actions and store them in self.state.
- Parameters:
actions – Actions taken by the agent.
- _actions_valid(action: ndarray) None
Check whether the actions are within the specified action space.
- Parameters:
action – Actions taken by the agent.
- Raise:
RuntimeError, when the actions are not inside of the action space.
- _create_new_state(additional_state: dict[str, float] | None) None
Take some initial values and create a new environment state object, stored in self.state.
- Parameters:
additional_state – Values to initialize the state.
- _done() bool
Check if the episode is over or not using the number of steps (n_steps) and the total number of steps in an episode (n_episode_steps).
- Returns:
boolean showing, whether the episode is done.
- _is_protocol = False
- _np_random: np.random.Generator | None = None
- _observations() ndarray
Determine the observations list from environment state. This uses state_config to determine all observations.
- Returns:
Observations for the agent as determined by state_config.
- _reduce_state_log() list[dict[str, float]]
Removes unwanted parameters from state_log before storing in state_log_longtime
- Returns:
The return value is a list of dictionaries, where the parameters that should not be stored were removed
- _reset_state() None
Store episode statistics and reset episode counters.
- abstract property description: str
Long description of the environment
- export_state_log(path: Path, names: Sequence[str] | None = None, *, sep: str = ';', decimal: str = '.') None
Extension of csv_export to include timeseries on the data
- Parameters:
names – Field names used when data is a Matrix without column names.
sep – Separator to use between the fields.
decimal – Sign to use for decimal points.
- classmethod get_info() tuple[str, str]
Get info about environment.
- Returns:
Tuple of version and description.
- get_scenario_state() dict[str, Any]
Get scenario data for the current time step of the environment, as specified in state_config. This assumes that scenario data in self.ts_current is available and scaled correctly.
- Returns:
Scenario data for current time step.
- import_scenario(*scenario_paths: Mapping[str, Any], prefix_renamed: bool = True) pd.DataFrame
Load data from csv into self.timeseries_data by using scenario_from_csv
- Parameters:
scenario_paths –
One or more scenario configuration dictionaries (or a list of dicts), which each contain a path for loading data from a scenario file. The dictionary should have the following structure, with <X> denoting the variable value:
Note
[{path: <X>, prefix: <X>, interpolation_method: <X>, resample_method: <X>, scale_factors: {col_name: <X>}, rename_cols: {col_name: <X>}, infer_datetime_cols: <X>, time_conversion_str: <X>}]
path: Path to the scenario file (relative to scenario_path).
prefix: Prefix for all columns in the file, useful if multiple imported files have the same column names.
interpolation_method: A pandas interpolation method, required if the frequency of values must be increased in comparison to the files’ data. (e.g.: ‘linear’ or ‘pad’).
scale_factors: Scaling factors for specific columns. This can be useful for example, if a column contains data in kilowatt and should be imported in watts. In this case, the scaling factor for the column would be 1000.
rename_cols: Mapping of column names from the file to new names for the imported data.
infer_datetime_cols: Number of the column which contains datetime data. If this value is not present, the time_conversion_str variable will be used to determine the datetime format.
time_conversion_str: Time conversion string, determining the datetime format used in the imported file (default: %Y-%m-%d %H:%M).
prefix_renamed – Determine whether the prefix is also applied to renamed columns.
- Returns:
Data Frame of the imported and formatted scenario data.
- property np_random: numpy.random.Generator
Returns the environment’s internal
_np_random
that if not set will initialise with a random seed.- Returns:
Instances of np.random.Generator
- pyo_update_params(updated_params: MutableMapping[str | None, Any], nonindex_param_append_string: str | None = None) None [source]
Updates model parameters and indexed parameters of a pyomo instance with values given in a dictionary. It assumes that the dictionary supplied in updated_params has the correct pyomo format.
- Parameters:
updated_params – Dictionary with the updated values.
nonindex_param_append_string – String to be appended to values which are not indexed. This can be used if indexed parameters need to be set with values that do not have an index.
- Returns:
Updated model instance.
- abstract render() None
Render the environment
The set of supported modes varies per environment. Some environments do not support rendering at all. By convention in Farama gymnasium, if mode is:
human: render to the current display or terminal and return nothing. Usually for human consumption.
rgb_array: Return a numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).
- property unwrapped: Env[ObsType, ActType]
Returns the base non-wrapped environment.
- Returns:
Env: The base non-wrapped
gymnasium.Env
instance
- abstract property version: str
Version of the environment
- verbose: int
Verbosity level used for logging.
- config_run: ConfigOptRun
Information about the optimization run and information about the paths. For example, it defines path_results and path_scenarios.
- path_results: pathlib.Path
Path for storing results.
- path_scenarios: pathlib.Path | None
Path for the scenario data.
- path_env: pathlib.Path
Path of the environment file.
- callback: Callable | None
Callback can be used for logging and plotting.
- state_modification_callback: Callable | None
Callback can be used for modifying the state at each time step.
- env_id: int
ID of the environment (useful for vectorized environments).
- run_name: str
Name of the current optimization run.
- n_episodes: int
Number of completed episodes.
- n_steps: int
Current step of the model (number of completed steps) in the current episode.
- n_steps_longtime: int
Current step of the model (total over all episodes).
- episode_duration: float
Duration of one episode in seconds.
- sampling_time: float
Sampling time (interval between optimization time steps) in seconds.
- n_episode_steps: int
Number of time steps (of width sampling_time) in each episode.
- scenario_time_begin: datetime
Beginning time of the scenario.
- scenario_time_end: datetime
Ending time of the scenario (should be in the format %Y-%m-%d %H:%M).
- timeseries: pd.DataFrame
The time series DataFrame contains all time series scenario data. It can be filled by the import_scenario method.
- ts_current: pd.DataFrame
Data frame containing the currently valid range of time series data.
- state_config: StateConfig | None
Configuration to describe what the environment state looks like.
- episode_timer: float
Episode timer (stores the start time of the episode).
- additional_state: dict[str, float] | None
Additional state information to append to the state during stepping and reset
- state_log_longtime: list[list[dict[str, float]]]
Log of the environment state over multiple episodes.
- data_log: list[dict[str, Any]]
Log of specific environment settings / other data, apart from state for the episode.
- data_log_longtime: list[list[dict[str, Any]]]
Log of specific environment settings / other data, apart from state, over multiple episodes.
- sim_steps_per_sample: int
Number of simulation steps to be taken for each sample. This must be a divisor of ‘sampling_time’.
- action_space: spaces.Space[ActType]
- observation_space: spaces.Space[ObsType]
- pyo_get_solution(names: set[str] | None = None) dict[str, float | int | dict[int, float | int]] [source]
Convert the pyomo solution into a more usable format for plotting.
- Parameters:
names – Names of the model parameters that are returned.
- Returns:
Dictionary of {parameter name: value} pairs. Value may be a dictionary of {time: value} pairs which contains one value for each optimization time step.
Simulation (FMU) Environment
The BaseEnvSim supports the optimization of FMU simulation models. Make sure to set the fmu_name attribute when subclassing this environment. The FMU file will be loaded from the same directory as the environment itself.
- class eta_utility.eta_x.envs.BaseEnvSim(env_id: int, config_run: ConfigOptRun, verbose: int = 2, callback: Callable | None = None, *, scenario_time_begin: datetime | str, scenario_time_end: datetime | str, episode_duration: TimeStep | str, sampling_time: TimeStep | str, model_parameters: Mapping[str, Any] | None = None, sim_steps_per_sample: int | str = 1, render_mode: str | None = None, **kwargs: Any)[source]
-
Base class for FMU Simulation models environments.
- Parameters:
env_id – Identification for the environment, useful when creating multiple environments.
config_run – Configuration of the optimization run.
verbose – Verbosity to use for logging.
callback – callback which should be called after each episode.
scenario_time_begin – Beginning time of the scenario.
scneario_time_end – Ending time of the scenario.
episode_duration – Duration of the episode in seconds.
sampling_time – Duration of a single time sample / time step in seconds.
model_parameters – Parameters for the mathematical model.
sim_steps_per_sample – Number of simulation steps to perform during every sample.
render_mode – Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.
kwargs – Other keyword arguments (for subclasses).
- abstract property fmu_name: str
Name of the FMU file
- sim_steps_per_sample: int
Number of simulation steps to be taken for each sample. This must be a divisor of ‘sampling_time’.
- path_fmu: pathlib.Path
The FMU is expected to be placed in the same folder as the environment
- model_parameters: Mapping[str, int | float] | None
Configuration for the FMU model parameters, that need to be set for initialization of the Model.
- simulator: FMUSimulator
Instance of the FMU. This can be used to directly access the eta_utility.FMUSimulator interface.
- _init_simulator(init_values: Mapping[str, int | float] | None = None) None [source]
Initialize the simulator object. Make sure to call _names_from_state before this or to otherwise initialize the names array.
This can also be used to reset the simulator after an episode is completed. It will reuse the same simulator object and reset it to the given initial values.
- Parameters:
init_values – Dictionary of initial values for some FMU variables.
- simulate(state: Mapping[str, float]) tuple[dict[str, float], bool, float] [source]
Perform a simulator step and return data as specified by the is_ext_observation parameter of the state_config.
- Parameters:
state – State of the environment before the simulation.
- Returns:
Output of the simulation, boolean showing whether all simulation steps where successful, time elapsed during simulation.
- step(action: np.ndarray) StepResult [source]
Perform one time step and return its results. This is called for every event or for every time step during the simulation/optimization run. It should utilize the actions as supplied by the agent to determine the new state of the environment. The method must return a five-tuple of observations, rewards, terminated, truncated, info.
This also updates self.state and self.state_log to store current state information.
Note
This function always returns 0 reward. Therefore, it must be extended if it is to be used with reinforcement learning agents. If you need to manipulate actions (discretization, policy shaping, …) do this before calling this function. If you need to manipulate observations and rewards, do this after calling this function.
- Parameters:
action – Actions to perform in the environment.
- Returns:
The return value represents the state of the environment after the step was performed.
observations: A numpy array with new observation values as defined by the observation space. Observations is a np.array() (numpy array) with floating point or integer values.
reward: The value of the reward function. This is just one floating point value.
terminated: Boolean value specifying whether an episode has been completed. If this is set to true, the reset function will automatically be called by the agent or by eta_i.
truncated: Boolean, whether the truncation condition outside the scope is satisfied. Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the user needs to call the reset function.
info: Provide some additional info about the state of the environment. The contents of this may be used for logging purposes in the future but typically do not currently serve a purpose.
- _update_state(action: ndarray) tuple[bool, float] [source]
Take additional_state, execute simulation and get state information from scenario. This function updates self.state and increments the step counter.
Warning
You have to update self.state_log with the entire state before leaving the step to store the state information.
- Parameters:
action – Actions to perform in the environment.
- Returns:
Success of the simulation, time taken for simulation.
- reset(*, seed: int | None = None, options: dict[str, Any] | None = None) tuple[ObservationType, dict[str, Any]] [source]
Resets the environment to an initial internal state, returning an initial observation and info.
This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the
seed
parameter otherwise if the environment already has a random number generator andreset()
is called withseed=None
, the RNG is not reset. When using the environment in conjunction with stable_baselines3, the vectorized environment will take care of seeding your custom environment automatically.For Custom environments, the first line of
reset()
should besuper().reset(seed=seed)
which implements the seeding correctly.Note
Don’t forget to store and reset the episode_timer.
- Parameters:
seed – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and
seed=None
(the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG andseed=None
is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. (default: None)options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)
- Returns:
Tuple of observation and info. The observation of the initial state will be an element of
observation_space
(typically a numpy array) and is analogous to the observation returned bystep()
. Info is a dictionary containing auxiliary information complementingobservation
. It should be analogous to theinfo
returned bystep()
.
- close() None [source]
Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.
Default behavior for the Simulation environment is to close the FMU object.
- _actions_to_state(actions: ndarray) None
Gather actions and store them in self.state.
- Parameters:
actions – Actions taken by the agent.
- _actions_valid(action: ndarray) None
Check whether the actions are within the specified action space.
- Parameters:
action – Actions taken by the agent.
- Raise:
RuntimeError, when the actions are not inside of the action space.
- _create_new_state(additional_state: dict[str, float] | None) None
Take some initial values and create a new environment state object, stored in self.state.
- Parameters:
additional_state – Values to initialize the state.
- _done() bool
Check if the episode is over or not using the number of steps (n_steps) and the total number of steps in an episode (n_episode_steps).
- Returns:
boolean showing, whether the episode is done.
- _is_protocol = False
- _np_random: np.random.Generator | None = None
- _observations() ndarray
Determine the observations list from environment state. This uses state_config to determine all observations.
- Returns:
Observations for the agent as determined by state_config.
- _reduce_state_log() list[dict[str, float]]
Removes unwanted parameters from state_log before storing in state_log_longtime
- Returns:
The return value is a list of dictionaries, where the parameters that should not be stored were removed
- _reset_state() None
Store episode statistics and reset episode counters.
- abstract property description: str
Long description of the environment
- export_state_log(path: Path, names: Sequence[str] | None = None, *, sep: str = ';', decimal: str = '.') None
Extension of csv_export to include timeseries on the data
- Parameters:
names – Field names used when data is a Matrix without column names.
sep – Separator to use between the fields.
decimal – Sign to use for decimal points.
- classmethod get_info() tuple[str, str]
Get info about environment.
- Returns:
Tuple of version and description.
- get_scenario_state() dict[str, Any]
Get scenario data for the current time step of the environment, as specified in state_config. This assumes that scenario data in self.ts_current is available and scaled correctly.
- Returns:
Scenario data for current time step.
- import_scenario(*scenario_paths: Mapping[str, Any], prefix_renamed: bool = True) pd.DataFrame
Load data from csv into self.timeseries_data by using scenario_from_csv
- Parameters:
scenario_paths –
One or more scenario configuration dictionaries (or a list of dicts), which each contain a path for loading data from a scenario file. The dictionary should have the following structure, with <X> denoting the variable value:
Note
[{path: <X>, prefix: <X>, interpolation_method: <X>, resample_method: <X>, scale_factors: {col_name: <X>}, rename_cols: {col_name: <X>}, infer_datetime_cols: <X>, time_conversion_str: <X>}]
path: Path to the scenario file (relative to scenario_path).
prefix: Prefix for all columns in the file, useful if multiple imported files have the same column names.
interpolation_method: A pandas interpolation method, required if the frequency of values must be increased in comparison to the files’ data. (e.g.: ‘linear’ or ‘pad’).
scale_factors: Scaling factors for specific columns. This can be useful for example, if a column contains data in kilowatt and should be imported in watts. In this case, the scaling factor for the column would be 1000.
rename_cols: Mapping of column names from the file to new names for the imported data.
infer_datetime_cols: Number of the column which contains datetime data. If this value is not present, the time_conversion_str variable will be used to determine the datetime format.
time_conversion_str: Time conversion string, determining the datetime format used in the imported file (default: %Y-%m-%d %H:%M).
prefix_renamed – Determine whether the prefix is also applied to renamed columns.
- Returns:
Data Frame of the imported and formatted scenario data.
- property np_random: numpy.random.Generator
Returns the environment’s internal
_np_random
that if not set will initialise with a random seed.- Returns:
Instances of np.random.Generator
- abstract render() None
Render the environment
The set of supported modes varies per environment. Some environments do not support rendering at all. By convention in Farama gymnasium, if mode is:
human: render to the current display or terminal and return nothing. Usually for human consumption.
rgb_array: Return a numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).
- property unwrapped: Env[ObsType, ActType]
Returns the base non-wrapped environment.
- Returns:
Env: The base non-wrapped
gymnasium.Env
instance
- abstract property version: str
Version of the environment
- verbose: int
Verbosity level used for logging.
- config_run: ConfigOptRun
Information about the optimization run and information about the paths. For example, it defines path_results and path_scenarios.
- path_results: pathlib.Path
Path for storing results.
- path_scenarios: pathlib.Path | None
Path for the scenario data.
- path_env: pathlib.Path
Path of the environment file.
- callback: Callable | None
Callback can be used for logging and plotting.
- state_modification_callback: Callable | None
Callback can be used for modifying the state at each time step.
- env_id: int
ID of the environment (useful for vectorized environments).
- run_name: str
Name of the current optimization run.
- n_episodes: int
Number of completed episodes.
- n_steps: int
Current step of the model (number of completed steps) in the current episode.
- n_steps_longtime: int
Current step of the model (total over all episodes).
- episode_duration: float
Duration of one episode in seconds.
- sampling_time: float
Sampling time (interval between optimization time steps) in seconds.
- n_episode_steps: int
Number of time steps (of width sampling_time) in each episode.
- scenario_duration: float
Duration of the scenario for each episode (for total time imported from csv).
- scenario_time_begin: datetime
Beginning time of the scenario.
- scenario_time_end: datetime
Ending time of the scenario (should be in the format %Y-%m-%d %H:%M).
- timeseries: pd.DataFrame
The time series DataFrame contains all time series scenario data. It can be filled by the import_scenario method.
- ts_current: pd.DataFrame
Data frame containing the currently valid range of time series data.
- state_config: StateConfig | None
Configuration to describe what the environment state looks like.
- episode_timer: float
Episode timer (stores the start time of the episode).
- additional_state: dict[str, float] | None
Additional state information to append to the state during stepping and reset
- state_log_longtime: list[list[dict[str, float]]]
Log of the environment state over multiple episodes.
- data_log: list[dict[str, Any]]
Log of specific environment settings / other data, apart from state for the episode.
- data_log_longtime: list[list[dict[str, Any]]]
Log of specific environment settings / other data, apart from state, over multiple episodes.
- action_space: spaces.Space[ActType]
- observation_space: spaces.Space[ObsType]
Live Connection Environment
The BaseEnvLive is an environment which create direct (live) connections to actual devices. It utilizes
eta_utility.connectors.LiveConnect
to achieve this. Please also read the corresponding documentation
because LiveConnect needs additional configuration.
- class eta_utility.eta_x.envs.BaseEnvLive(env_id: int, config_run: ConfigOptRun, verbose: int = 2, callback: Callable | None = None, *, scenario_time_begin: datetime | str, scenario_time_end: datetime | str, episode_duration: TimeStep | str, sampling_time: TimeStep | str, max_errors: int = 10, render_mode: str | None = None, **kwargs: Any)[source]
-
Base class for Live Connector environments. The class will prepare the initialization of the LiveConnect class and provide facilities to automatically read step results and reset the connection.
- Parameters:
env_id – Identification for the environment, useful when creating multiple environments.
config_run – Configuration of the optimization run.
verbose – Verbosity to use for logging.
callback – callback which should be called after each episode.
scenario_time_begin – Beginning time of the scenario.
scenario_time_end – Ending time of the scenario.
episode_duration – Duration of the episode in seconds.
sampling_time – Duration of a single time sample / time step in seconds.
max_errors – Maximum number of connection errors before interrupting the optimization process.
render_mode – Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.
kwargs – Other keyword arguments (for subclasses).
- abstract property config_name: str
Name of the live_connect configuration
- live_connector: LiveConnect
Instance of the Live Connector.
- live_connect_config: Path | Sequence[Path] | dict[str, Any] | None
Path or Dict to initialize the live connector.
- max_error_count: int
Maximum error count when connections in live connector are aborted.
- _init_live_connector(files: Path | Sequence[Path] | dict[str, Any] | None = None) None [source]
Initialize the live connector object. Make sure to call _names_from_state before this or to otherwise initialize the names array.
- Parameters:
files – Path or Dict to initialize the connection directly from JSON configuration files or a config dictionary.
- step(action: np.ndarray) StepResult [source]
Perform one time step and return its results. This is called for every event or for every time step during the simulation/optimization run. It should utilize the actions as supplied by the agent to determine the new state of the environment. The method must return a five-tuple of observations, rewards, terminated, truncated, info.
This also updates self.state and self.state_log to store current state information.
Note
This function always returns 0 reward. Therefore, it must be extended if it is to be used with reinforcement learning agents. If you need to manipulate actions (discretization, policy shaping, …) do this before calling this function. If you need to manipulate observations and rewards, do this after calling this function.
- Parameters:
action – Actions to perform in the environment.
- Returns:
The return value represents the state of the environment after the step was performed.
observations: A numpy array with new observation values as defined by the observation space. Observations is a np.array() (numpy array) with floating point or integer values.
reward: The value of the reward function. This is just one floating point value.
terminated: Boolean value specifying whether an episode has been completed. If this is set to true, the reset function will automatically be called by the agent or by eta_i.
truncated: Boolean, whether the truncation condition outside the scope is satisfied. Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the user needs to call the reset function.
info: Provide some additional info about the state of the environment. The contents of this may be used for logging purposes in the future but typically do not currently serve a purpose.
- reset(*, seed: int | None = None, options: dict[str, Any] | None = None) tuple[ObservationType, dict[str, Any]] [source]
Resets the environment to an initial internal state, returning an initial observation and info.
This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the
seed
parameter otherwise if the environment already has a random number generator andreset()
is called withseed=None
, the RNG is not reset. When using the environment in conjunction with stable_baselines3, the vectorized environment will take care of seeding your custom environment automatically.For Custom environments, the first line of
reset()
should besuper().reset(seed=seed)
which implements the seeding correctly.Note
Don’t forget to store and reset the episode_timer.
- Parameters:
seed – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and
seed=None
(the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG andseed=None
is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. (default: None)options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)
- Returns:
Tuple of observation and info. The observation of the initial state will be an element of
observation_space
(typically a numpy array) and is analogous to the observation returned bystep()
. Info is a dictionary containing auxiliary information complementingobservation
. It should be analogous to theinfo
returned bystep()
.
- close() None [source]
Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.
Default behavior for the Live_Connector environment is to do nothing.
- _actions_to_state(actions: ndarray) None
Gather actions and store them in self.state.
- Parameters:
actions – Actions taken by the agent.
- _actions_valid(action: ndarray) None
Check whether the actions are within the specified action space.
- Parameters:
action – Actions taken by the agent.
- Raise:
RuntimeError, when the actions are not inside of the action space.
- _create_new_state(additional_state: dict[str, float] | None) None
Take some initial values and create a new environment state object, stored in self.state.
- Parameters:
additional_state – Values to initialize the state.
- _done() bool
Check if the episode is over or not using the number of steps (n_steps) and the total number of steps in an episode (n_episode_steps).
- Returns:
boolean showing, whether the episode is done.
- _is_protocol = False
- _np_random: np.random.Generator | None = None
- _observations() ndarray
Determine the observations list from environment state. This uses state_config to determine all observations.
- Returns:
Observations for the agent as determined by state_config.
- _reduce_state_log() list[dict[str, float]]
Removes unwanted parameters from state_log before storing in state_log_longtime
- Returns:
The return value is a list of dictionaries, where the parameters that should not be stored were removed
- _reset_state() None
Store episode statistics and reset episode counters.
- abstract property description: str
Long description of the environment
- export_state_log(path: Path, names: Sequence[str] | None = None, *, sep: str = ';', decimal: str = '.') None
Extension of csv_export to include timeseries on the data
- Parameters:
names – Field names used when data is a Matrix without column names.
sep – Separator to use between the fields.
decimal – Sign to use for decimal points.
- classmethod get_info() tuple[str, str]
Get info about environment.
- Returns:
Tuple of version and description.
- get_scenario_state() dict[str, Any]
Get scenario data for the current time step of the environment, as specified in state_config. This assumes that scenario data in self.ts_current is available and scaled correctly.
- Returns:
Scenario data for current time step.
- import_scenario(*scenario_paths: Mapping[str, Any], prefix_renamed: bool = True) pd.DataFrame
Load data from csv into self.timeseries_data by using scenario_from_csv
- Parameters:
scenario_paths –
One or more scenario configuration dictionaries (or a list of dicts), which each contain a path for loading data from a scenario file. The dictionary should have the following structure, with <X> denoting the variable value:
Note
[{path: <X>, prefix: <X>, interpolation_method: <X>, resample_method: <X>, scale_factors: {col_name: <X>}, rename_cols: {col_name: <X>}, infer_datetime_cols: <X>, time_conversion_str: <X>}]
path: Path to the scenario file (relative to scenario_path).
prefix: Prefix for all columns in the file, useful if multiple imported files have the same column names.
interpolation_method: A pandas interpolation method, required if the frequency of values must be increased in comparison to the files’ data. (e.g.: ‘linear’ or ‘pad’).
scale_factors: Scaling factors for specific columns. This can be useful for example, if a column contains data in kilowatt and should be imported in watts. In this case, the scaling factor for the column would be 1000.
rename_cols: Mapping of column names from the file to new names for the imported data.
infer_datetime_cols: Number of the column which contains datetime data. If this value is not present, the time_conversion_str variable will be used to determine the datetime format.
time_conversion_str: Time conversion string, determining the datetime format used in the imported file (default: %Y-%m-%d %H:%M).
prefix_renamed – Determine whether the prefix is also applied to renamed columns.
- Returns:
Data Frame of the imported and formatted scenario data.
- property np_random: numpy.random.Generator
Returns the environment’s internal
_np_random
that if not set will initialise with a random seed.- Returns:
Instances of np.random.Generator
- abstract render() None
Render the environment
The set of supported modes varies per environment. Some environments do not support rendering at all. By convention in Farama gymnasium, if mode is:
human: render to the current display or terminal and return nothing. Usually for human consumption.
rgb_array: Return a numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).
- property unwrapped: Env[ObsType, ActType]
Returns the base non-wrapped environment.
- Returns:
Env: The base non-wrapped
gymnasium.Env
instance
- abstract property version: str
Version of the environment
- verbose: int
Verbosity level used for logging.
- config_run: ConfigOptRun
Information about the optimization run and information about the paths. For example, it defines path_results and path_scenarios.
- path_results: pathlib.Path
Path for storing results.
- path_scenarios: pathlib.Path | None
Path for the scenario data.
- path_env: pathlib.Path
Path of the environment file.
- callback: Callable | None
Callback can be used for logging and plotting.
- state_modification_callback: Callable | None
Callback can be used for modifying the state at each time step.
- env_id: int
ID of the environment (useful for vectorized environments).
- run_name: str
Name of the current optimization run.
- n_episodes: int
Number of completed episodes.
- n_steps: int
Current step of the model (number of completed steps) in the current episode.
- n_steps_longtime: int
Current step of the model (total over all episodes).
- episode_duration: float
Duration of one episode in seconds.
- sampling_time: float
Sampling time (interval between optimization time steps) in seconds.
- n_episode_steps: int
Number of time steps (of width sampling_time) in each episode.
- scenario_duration: float
Duration of the scenario for each episode (for total time imported from csv).
- scenario_time_begin: datetime
Beginning time of the scenario.
- scenario_time_end: datetime
Ending time of the scenario (should be in the format %Y-%m-%d %H:%M).
- timeseries: pd.DataFrame
The time series DataFrame contains all time series scenario data. It can be filled by the import_scenario method.
- ts_current: pd.DataFrame
Data frame containing the currently valid range of time series data.
- state_config: StateConfig | None
Configuration to describe what the environment state looks like.
- episode_timer: float
Episode timer (stores the start time of the episode).
- additional_state: dict[str, float] | None
Additional state information to append to the state during stepping and reset
- state_log_longtime: list[list[dict[str, float]]]
Log of the environment state over multiple episodes.
- data_log: list[dict[str, Any]]
Log of specific environment settings / other data, apart from state for the episode.
- data_log_longtime: list[list[dict[str, Any]]]
Log of specific environment settings / other data, apart from state, over multiple episodes.
- sim_steps_per_sample: int
Number of simulation steps to be taken for each sample. This must be a divisor of ‘sampling_time’.
- action_space: spaces.Space[ActType]
- observation_space: spaces.Space[ObsType]
Julia Environment
The JuliaEnv is an environment that supports the connection to a julia file. Make sure to set the julia_env_file where your julia file is located. In contrast to the other environments, the Julia class, written in Python, must be imported in the setup file for the parameter environment_import. The parameter julia_env_file is located in the settings section of the configuration file. See also Experiment configuration.
- class eta_utility.eta_x.envs.JuliaEnv(env_id: int, config_run: ConfigOptRun, verbose: int = 2, callback: Callable | None = None, *, scenario_time_begin: datetime | str, scenario_time_end: datetime | str, episode_duration: TimeStep | str, sampling_time: TimeStep | str, julia_env_file: pathlib.Path | str, render_mode: str | None = None, **kwargs: Any)[source]
Bases:
BaseEnv
TODO: UPDATE DOCUMENTATION! Abstract environment definition, providing some basic functionality for concrete environments to use. The class implements and adapts functions from gymnasium.Env. It provides additional functionality as required by the ETA-X framework and should be used as the starting point for new environments.
The initialization of this superclass performs many of the necessary tasks, required to specify a concrete environment. Read the documentation carefully to understand, how new environments can be developed, building on this starting point.
There are some attributes that must be set and some methods that must be implemented to satisfy the interface. This is required to create concrete environments. The required attributes are:
version: Version number of the environment.
description: Short description string of the environment.
action_space: The action space of the environment (see also gymnasium.spaces for options).
observation_space: The observation space of the environment (see also gymnasium.spaces for options).
The gymnasium interface requires the following methods for the environment to work correctly within the framework. Consult the documentation of each method for more detail.
step()
reset()
close()
- Parameters:
env_id – Identification for the environment, useful when creating multiple environments.
config_run – Configuration of the optimization run.
verbose – Verbosity to use for logging.
callback – callback which should be called after each episode.
scenario_time_begin – Beginning time of the scenario.
scenario_time_end – Ending time of the scenario.
episode_duration – Duration of the episode in seconds.
sampling_time – Duration of a single time sample / time step in seconds.
render_mode – Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.
kwargs – Other keyword arguments (for subclasses).
- version = '1.0'
- description = 'This environment uses a julia file to perform its functions.'
- julia_env_path: pathlib.Path
Root Path to the julia file.
- __jl: ModuleType
Imported Julia file as a module (written in julia) for further initialization of the environment.
- _jlenv
Initialized julia environment (written in julia).
- first_update(observations: ndarray) ndarray [source]
Perform the first update and set values in simulation model to the observed values.
- Parameters:
observations – Observations of another environment.
- Returns:
Full array of observations.
- update(observations: ndarray) ndarray [source]
Update the optimization model with observations from another environment.
- Parameters:
observations – Observations from another environment
- Returns:
Full array of current observations
- step(action: np.ndarray) StepResult [source]
Perform one time step and return its results. This is called for every event or for every time step during the simulation/optimization run. It should utilize the actions as supplied by the agent to determine the new state of the environment. The method must return a five-tuple of observations, rewards, terminated, truncated, info.
Note
Do not forget to increment n_steps and n_steps_longtime.
- Parameters:
action – Actions taken by the agent.
- Returns:
The return value represents the state of the environment after the step was performed.
observations: A numpy array with new observation values as defined by the observation space. Observations is a np.array() (numpy array) with floating point or integer values.
reward: The value of the reward function. This is just one floating point value.
terminated: Boolean value specifying whether an episode has been completed. If this is set to true, the reset function will automatically be called by the agent or by eta_i.
truncated: Boolean, whether the truncation condition outside the scope is satisfied. Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the user needs to call the reset function.
info: Provide some additional info about the state of the environment. The contents of this may be used for logging purposes in the future but typically do not currently serve a purpose.
- _reduce_state_log() list[dict[str, float]] [source]
Removes unwanted parameters from state_log before storing in state_log_longtime
- Returns:
The return value is a list of dictionaries, where the parameters that should not be stored were removed
- reset(*, seed: int | None = None, options: dict[str, Any] | None = None) tuple[np.ndarray, dict[str, Any]] [source]
Resets the environment to an initial internal state, returning an initial observation and info.
This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the
seed
parameter otherwise if the environment already has a random number generator andreset()
is called withseed=None
, the RNG is not reset. When using the environment in conjunction with stable_baselines3, the vectorized environment will take care of seeding your custom environment automatically.For Custom environments, the first line of
reset()
should besuper().reset(seed=seed)
which implements the seeding correctly.Note
Don’t forget to store and reset the episode_timer.
- Parameters:
seed – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and
seed=None
(the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG andseed=None
is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. (default: None)options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)
- Returns:
Tuple of observation and info. The observation of the initial state will be an element of
observation_space
(typically a numpy array) and is analogous to the observation returned bystep()
. Info is a dictionary containing auxiliary information complementingobservation
. It should be analogous to theinfo
returned bystep()
.
- close() None [source]
Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.
- render(**kwargs: Any) None [source]
Render the environment
The set of supported modes varies per environment. Some environments do not support rendering at all. By convention in Farama gymnasium, if mode is:
human: render to the current display or terminal and return nothing. Usually for human consumption.
rgb_array: Return a numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).
- _actions_to_state(actions: ndarray) None
Gather actions and store them in self.state.
- Parameters:
actions – Actions taken by the agent.
- _actions_valid(action: ndarray) None
Check whether the actions are within the specified action space.
- Parameters:
action – Actions taken by the agent.
- Raise:
RuntimeError, when the actions are not inside of the action space.
- _create_new_state(additional_state: dict[str, float] | None) None
Take some initial values and create a new environment state object, stored in self.state.
- Parameters:
additional_state – Values to initialize the state.
- _done() bool
Check if the episode is over or not using the number of steps (n_steps) and the total number of steps in an episode (n_episode_steps).
- Returns:
boolean showing, whether the episode is done.
- _is_protocol = False
- _np_random: np.random.Generator | None = None
- _observations() ndarray
Determine the observations list from environment state. This uses state_config to determine all observations.
- Returns:
Observations for the agent as determined by state_config.
- _reset_state() None
Store episode statistics and reset episode counters.
- export_state_log(path: Path, names: Sequence[str] | None = None, *, sep: str = ';', decimal: str = '.') None
Extension of csv_export to include timeseries on the data
- Parameters:
names – Field names used when data is a Matrix without column names.
sep – Separator to use between the fields.
decimal – Sign to use for decimal points.
- classmethod get_info() tuple[str, str]
Get info about environment.
- Returns:
Tuple of version and description.
- get_scenario_state() dict[str, Any]
Get scenario data for the current time step of the environment, as specified in state_config. This assumes that scenario data in self.ts_current is available and scaled correctly.
- Returns:
Scenario data for current time step.
- import_scenario(*scenario_paths: Mapping[str, Any], prefix_renamed: bool = True) pd.DataFrame
Load data from csv into self.timeseries_data by using scenario_from_csv
- Parameters:
scenario_paths –
One or more scenario configuration dictionaries (or a list of dicts), which each contain a path for loading data from a scenario file. The dictionary should have the following structure, with <X> denoting the variable value:
Note
[{path: <X>, prefix: <X>, interpolation_method: <X>, resample_method: <X>, scale_factors: {col_name: <X>}, rename_cols: {col_name: <X>}, infer_datetime_cols: <X>, time_conversion_str: <X>}]
path: Path to the scenario file (relative to scenario_path).
prefix: Prefix for all columns in the file, useful if multiple imported files have the same column names.
interpolation_method: A pandas interpolation method, required if the frequency of values must be increased in comparison to the files’ data. (e.g.: ‘linear’ or ‘pad’).
scale_factors: Scaling factors for specific columns. This can be useful for example, if a column contains data in kilowatt and should be imported in watts. In this case, the scaling factor for the column would be 1000.
rename_cols: Mapping of column names from the file to new names for the imported data.
infer_datetime_cols: Number of the column which contains datetime data. If this value is not present, the time_conversion_str variable will be used to determine the datetime format.
time_conversion_str: Time conversion string, determining the datetime format used in the imported file (default: %Y-%m-%d %H:%M).
prefix_renamed – Determine whether the prefix is also applied to renamed columns.
- Returns:
Data Frame of the imported and formatted scenario data.
- property np_random: numpy.random.Generator
Returns the environment’s internal
_np_random
that if not set will initialise with a random seed.- Returns:
Instances of np.random.Generator
- property unwrapped: Env[ObsType, ActType]
Returns the base non-wrapped environment.
- Returns:
Env: The base non-wrapped
gymnasium.Env
instance
- verbose: int
Verbosity level used for logging.
- config_run: ConfigOptRun
Information about the optimization run and information about the paths. For example, it defines path_results and path_scenarios.
- path_results: pathlib.Path
Path for storing results.
- path_scenarios: pathlib.Path | None
Path for the scenario data.
- path_env: pathlib.Path
Path of the environment file.
- callback: Callable | None
Callback can be used for logging and plotting.
- state_modification_callback: Callable | None
Callback can be used for modifying the state at each time step.
- env_id: int
ID of the environment (useful for vectorized environments).
- run_name: str
Name of the current optimization run.
- n_episodes: int
Number of completed episodes.
- n_steps: int
Current step of the model (number of completed steps) in the current episode.
- n_steps_longtime: int
Current step of the model (total over all episodes).
- episode_duration: float
Duration of one episode in seconds.
- sampling_time: float
Sampling time (interval between optimization time steps) in seconds.
- n_episode_steps: int
Number of time steps (of width sampling_time) in each episode.
- scenario_duration: float
Duration of the scenario for each episode (for total time imported from csv).
- scenario_time_begin: datetime
Beginning time of the scenario.
- scenario_time_end: datetime
Ending time of the scenario (should be in the format %Y-%m-%d %H:%M).
- timeseries: pd.DataFrame
The time series DataFrame contains all time series scenario data. It can be filled by the import_scenario method.
- ts_current: pd.DataFrame
Data frame containing the currently valid range of time series data.
- state_config: StateConfig | None
Configuration to describe what the environment state looks like.
- episode_timer: float
Episode timer (stores the start time of the episode).
- additional_state: dict[str, float] | None
Additional state information to append to the state during stepping and reset
- state_log_longtime: list[list[dict[str, float]]]
Log of the environment state over multiple episodes.
- data_log: list[dict[str, Any]]
Log of specific environment settings / other data, apart from state for the episode.
- data_log_longtime: list[list[dict[str, Any]]]
Log of specific environment settings / other data, apart from state, over multiple episodes.
- sim_steps_per_sample: int
Number of simulation steps to be taken for each sample. This must be a divisor of ‘sampling_time’.
- action_space: spaces.Space[ActType]
- observation_space: spaces.Space[ObsType]