eta_utility.eta_x.envs.base_env_mpc module

class eta_utility.eta_x.envs.base_env_mpc.BaseEnvMPC(env_id: int, config_run: ConfigOptRun, verbose: int = 2, callback: Callable | None = None, *, scenario_time_begin: datetime | str, scenario_time_end: datetime | str, episode_duration: TimeStep | str, sampling_time: TimeStep | str, model_parameters: Mapping[str, Any], prediction_scope: TimeStep | str | None = None, render_mode: str | None = None, **kwargs: Any)[source]

Bases: BaseEnv, ABC

Base class for mathematical MPC models. This class can be used in conjunction with the MathSolver agent. You need to implement the _model method in a subclass and return a pyomo.AbstractModel from it.

Parameters:

env_id – Identification for the environment, useful when creating multiple environments.
config_run – Configuration of the optimization run.
verbose – Verbosity to use for logging.
callback – callback which should be called after each episode.
scenario_time_begin – Beginning time of the scenario.
scenario_time_end – Ending time of the scenario.
episode_duration – Duration of the episode in seconds.
sampling_time – Duration of a single time sample / time step in seconds.
model_parameters – Parameters for the mathematical model.
prediction_scope – Duration of the prediction (usually a subsample of the episode duration).
render_mode – Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.
kwargs – Other keyword arguments (for subclasses).

prediction_scope: float: Total duration of one prediction/optimization run when used with the MPC agent. This is automatically set to the value of episode_duration if it is not supplied separately.

n_prediction_steps: int: Number of steps in the prediction (prediction_scope/sampling_time).

scenario_duration: float: Duration of the scenario for each episode (for total time imported from csv).

model_parameters: Configuration for the MILP model parameters.

time_var: str | None: Name of the “time” variable/set in the model (i.e. “T”). This is if the pyomo sets must be re-indexed when updating the model between time steps. If this is None, it is assumed that no reindexing of the timeseries data is required during updates - this is the default.

nonindex_update_append_string: str | None: Updating indexed model parameters can be achieved either by updating only the first value of the actual parameter itself or by having a separate handover parameter that is used for specifying only the first value. The separate handover parameter can be denoted with an appended string. For example, if the actual parameter is x.ON then the handover parameter could be x.ON_first. To use x.ON_first for updates, set the nonindex_update_append_string to “_first”. If the attribute is set to None, the first value of the actual parameter (x.ON) would be updated instead.

property model: tuple[ConcreteModel, list]

The model property is a tuple of the concrete model and the order of the action space. This is used such that the MPC algorithm can re-sort the action output. This sorting cannot be conveyed differently through pyomo.

Returns:: Tuple of the concrete model and the order of the action space.

step(action: np.ndarray) → StepResult[source]

Perform one time step and return its results. This is called for every event or for every time step during the simulation/optimization run. It should utilize the actions as supplied by the agent to determine the new state of the environment. The method must return a five-tuple of observations, rewards, terminated, truncated, info.

This also updates self.state and self.state_log to store current state information.

Note

This function always returns 0 reward. Therefore, it must be extended if it is to be used with reinforcement learning agents. If you need to manipulate actions (discretization, policy shaping, …) do this before calling this function. If you need to manipulate observations and rewards, do this after calling this function.

Parameters:

action (np.ndarray) – Actions to perform in the environment.

Returns:

The return value represents the state of the environment after the step was performed.

observations: A numpy array with new observation values as defined by the observation space. Observations is a np.array() (numpy array) with floating point or integer values.
reward: The value of the reward function. This is just one floating point value.
terminated: Boolean value specifying whether an episode has been completed. If this is set to true, the reset function will automatically be called by the agent or by eta_i.
truncated: Boolean, whether the truncation condition outside the scope is satisfied.
truncated: Boolean, whether the truncation condition outside the scope is satisfied. Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the user needs to call the reset function.
info: Provide some additional info about the state of the environment. The contents of this may be used for logging purposes in the future but typically do not currently serve a purpose.

update(observations: Sequence[Sequence[float | int]] | None = None) → np.ndarray[source]

Update the optimization model with observations from another environment.

Parameters:: observations – Observations from another environment.
Returns:: Full array of current observations.

solve_failed(model: pyo.ConcreteModel, result: SolverResults) → None[source]

This method will try to render the result in case the model could not be solved. It should automatically be called by the agent.

Parameters:

model – Current model.
result – Result of the last solution attempt.

reset(*, seed: int | None = None, options: dict[str, Any] | None = None) → tuple[np.ndarray, dict[str, Any]][source]

Resets the environment to an initial internal state, returning an initial observation and info.

This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the seed parameter otherwise if the environment already has a random number generator and reset() is called with seed=None, the RNG is not reset. When using the environment in conjunction with stable_baselines3, the vectorized environment will take care of seeding your custom environment automatically.

For Custom environments, the first line of reset() should be super().reset(seed=seed) which implements the seeding correctly.

Note

Don’t forget to store and reset the episode_timer.

Parameters:

seed – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and seed=None (the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG and seed=None is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. (default: None)
options – Additional information to specify how the environment is reset (optional, depending on the specific environment) (default: None)

Returns:

Tuple of observation and info. The observation of the initial state will be an element of observation_space (typically a numpy array) and is analogous to the observation returned by step(). Info is a dictionary containing auxiliary information complementing observation. It should be analogous to the info returned by step().

close() → None[source]

Close the environment. This should always be called when an entire run is finished. It should be used to close any resources (i.e. simulation models) used by the environment.

Default behavior for the MPC environment is to do nothing.

Retrieve parameters for the named component and convert the parameters into the pyomo dict-format. If required, timeseries can be added to the parameters and timeseries may be reindexed. The pyo_convert_timeseries function is used for timeseries handling. See also pyo_convert_timeseries

Parameters:

component_name – Name of the component.
ts – Timeseries for the component.
index – New index for timeseries data. If this is supplied, all timeseries will be copied and reindexed.

Returns:

Pyomo parameter dictionary.

Convert a time series data into a pyomo format. Data will be reindexed if a new index is provided.

Parameters:

ts – Timeseries to convert.
index – New index for timeseries data. If this is supplied, all timeseries will be copied and reindexed.
component_name – Name of a specific component that the timeseries is used for. This limits which timeseries are returned.
_add_wrapping_none – Add a “None” indexed dictionary as the top level.

Returns:

Pyomo parameter dictionary.

pyo_update_params(updated_params: MutableMapping[str | None, Any], nonindex_param_append_string: str | None = None) → None[source]

Updates model parameters and indexed parameters of a pyomo instance with values given in a dictionary. It assumes that the dictionary supplied in updated_params has the correct pyomo format.

Parameters:

updated_params – Dictionary with the updated values.
nonindex_param_append_string – String to be appended to values which are not indexed. This can be used if indexed parameters need to be set with values that do not have an index.

Returns:

Updated model instance.

pyo_get_solution(names: set[str] | None = None) → dict[str, float | int | dict[int, float | int]][source]

Convert the pyomo solution into a more usable format for plotting.

Parameters:: names – Names of the model parameters that are returned.
Returns:: Dictionary of {parameter name: value} pairs. Value may be a dictionary of {time: value} pairs which contains one value for each optimization time step.

pyo_get_component_value(component: pyo.Component, at: int = 1, allow_stale: bool = False) → float | int | None[source]