eta_utility.eta_x.agents.math_solver module
- class eta_utility.eta_x.agents.math_solver.MathSolver(policy: type[BasePolicy], env: VecEnv, verbose: int = 1, *, solver_name: str = 'cplex', action_index: int = 0, _init_setup_model: bool = True, **kwargs: Any)[source]
Bases:
BaseAlgorithm
Simple, Pyomo based MPC agent.
The agent requires an environment that specifies the ‘model’ attribute, returning a
pyomo.ConcreteModel
and a sorted list as the order for the action space. This list is used to avoid ambiguity when returning a list of actions. Since the model specifies its own action and observation space, this agent does not use the action_space and observation_space specified by the environment.- Parameters:
policy – Agent policy. Parameter is not used in this agent.
env – Environment to be optimized.
verbose – Logging verbosity.
solver_name – Name of the solver, could be cplex or glpk.
action_index – Index of the solution value to be used as action (if this is 0, the first value in a list of solution values will be used).
kwargs – Additional arguments as specified in stable_baselins3.common.base_class or as provided by solver.
- model: pyo.ConcreteModel
Pyomo optimization model as specified by the environment.
- action_index
Index of the solution value to be used as action (if this is 0, the first value in a list of solution values will be used).
- solve() ConcreteModel [source]
Solve the current pyomo model instance with given parameters. This could also be used separately to solve normal MILP problems. Since the entire problem instance is returned, result handling can be outsourced.
- Returns:
Solved pyomo model instance.
- predict(observation: np.ndarray | dict[str, np.ndarray], state: tuple[np.ndarray, ...] | None = None, episode_start: np.ndarray | None = None, deterministic: bool = False) tuple[np.ndarray, tuple[np.ndarray, ...] | None] [source]
Solve the current pyomo model instance with given parameters and observations and return the optimal actions.
- Parameters:
observation – the input observation (not used here).
state – The last states (not used here).
episode_start – The last masks (not used here).
deterministic – Whether to return deterministic actions. This agent always returns deterministic actions.
- Returns:
Tuple of the model’s action and the next state (not used here).
- action_probability(observation: np.ndarray, state: np.ndarray | None = None, mask: np.ndarray | None = None, actions: np.ndarray | None = None, logp: bool = False) None [source]
The MPC approach cannot predict probabilities of single actions.
- learn(total_timesteps: int, callback: MaybeCallback = None, log_interval: int = 100, tb_log_name: str = 'run', reset_num_timesteps: bool = True, progress_bar: bool = False) MathSolver [source]
The MPC approach cannot learn a new model. Specify the model attribute as a pyomo Concrete model instead, to use the prediction function of this agent.
- Parameters:
total_timesteps – The total number of samples (env steps) to train on
callback – callback(s) called at every step with state of the algorithm.
log_interval – The number of timesteps before logging.
tb_log_name – the name of the run for TensorBoard logging
reset_num_timesteps – whether or not to reset the current timestep number (used in logging)
progress_bar – Display a progress bar using tqdm and rich.
- Returns:
The trained model.
- classmethod load(path: str | pathlib.Path | io.BufferedIOBase, env: GymEnv | None = None, device: th.device | str = 'auto', custom_objects: dict[str, Any] | None = None, print_system_info: bool = False, force_reset: bool = True, **kwargs: Any) MathSolver [source]
Load the model from a zip-file. Warning:
load
re-creates the model from scratch, it does not update it in-place!- Parameters:
path – path to the file (or a file-like) where to load the agent from
env – the new environment to run the loaded model on (can be None if you only need prediction from a trained model) has priority over any saved environment
device – Device on which the code should run.
custom_objects – Dictionary of objects to replace upon loading. If a variable is present in this dictionary as a key, it will not be deserialized and the corresponding item will be used instead.
print_system_info – Whether to print system info from the saved model and the current system info (useful to debug loading issues)
force_reset – Force call to
reset()
before training to avoid unexpected behavior.kwargs – extra arguments to change the model when loading
- Returns:
new model instance with loaded parameters