Control Algorithms

The agents implemented in eta_x.agents are subclasses of stable_baselines3.common.base_class.BaseAlgorithm in stable_baselines3. Calling them agents is a remnant from stable_baselines2 (the wording was changed in stable_baselines3).

Usually there is no need to dive more deeply into the agents provided by eta_x. You can use them by specifying their import path in your experiment configuration and don’t have to worry about how they work. It is good to know however, that some agents do not implement all methods which would be required by the interface in normal usage. Within the eta_x framework this usually isn’t a problem, because the methods are not used.

The currently available agents are listed here. Note that you need to specify the parameters required for instantiation in the agent_specific section of the eta_x configuration file.

Math Solver Agent

The MathSolver agent implements a model predictive controller. It can be used to execute mathematical models in conjunction with mathematical solvers such as cplex or glpk and it relies on the pyomo library to achieve this.

You can provide additional arguments in kwargs to the agent. These will be interpreted first as arguments for the base class and then for the solver. Meaning that arguments which are passed to MathSolver and not recognized by BaseAlgorithm will be passed on to the solver. This allows free configuration of all solver options.

class eta_utility.eta_x.agents.MathSolver(policy: type[BasePolicy], env: VecEnv, verbose: int = 1, *, solver_name: str = 'cplex', action_index: int = 0, _init_setup_model: bool = True, **kwargs: Any)[source]

Simple, Pyomo based MPC agent.

The agent requires an environment that specifies the ‘model’ attribute, returning a pyomo.ConcreteModel and a sorted list as the order for the action space. This list is used to avoid ambiguity when returning a list of actions. Since the model specifies its own action and observation space, this agent does not use the action_space and observation_space specified by the environment.

Parameters:

policy – Agent policy. Parameter is not used in this agent.
env – Environment to be optimized.
verbose – Logging verbosity.
solver_name – Name of the solver, could be cplex or glpk.
action_index – Index of the solution value to be used as action (if this is 0, the first value in a list of solution values will be used).
kwargs – Additional arguments as specified in stable_baselines3.common.base_class or as provided by solver.

Rule Based Agent (Base Class)

The rule based agent is a base class which facilitates the creation of simple rule based agents. To use it, you need to implement the eta_utility.eta_x.agents.RuleBased.control_rules method. The control_rules method takes the array of observations from the environment and determines an array of actions based on them.

class eta_utility.eta_x.agents.RuleBased(policy: type[BasePolicy], env: VecEnv, verbose: int = 4, _init_setup_model: bool = True, **kwargs: Any)[source]

The rule based agent base class provides the facilities to easily build a complete rule based agent. To achieve this, only the control_rules function must be implemented. It should take an observation from the environment as input and provide actions as an output.

Parameters:

policy – Agent policy. Parameter is not used in this agent and can be set to NoPolicy.
env – Environment to be controlled.
verbose – Logging verbosity.
kwargs – Additional arguments as specified in stable_baselines3.common.base_class.

abstract control_rules(observation: ndarray) → ndarray[source]

This function is abstract and should be used to implement control rules which determine actions from the received observations.

Parameters:: observation – Observations as provided by a single, non vectorized environment.
Returns:: Action values, as determined by the control rules.

Non-dominated Sorting Genetic Algorithm (NSGA-II)

The genetic optimiser implements a genetic algorithm based on the NSGA-II. The NSGA-II was first developed and implemented by Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal and T. Meyarivan in their collaborative paper ‘A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II’ in 2002.

The NSGA2 agent is a class that will operate according to the algorithm. To use it, you need to instantiate the eta_utility.eta_x.agents.Nsga2 class and use the learn function combined with an environment to solve your optimization problem. The algorithm will find the optimal actions based on interactions with the environment.

class eta_utility.eta_x.agents.Nsga2(policy: type[BasePolicy], env: VecEnv, learning_rate: float | Schedule = 1.0, verbose: int = 2, *, population: int = 100, mutations: float = 0.05, crossovers: float = 0.1, n_generations: int = 100, max_cross_len: float = 1, max_retries: int = 100000, sense: str = 'minimize', predict_learn_steps: int = 5, seed: int = 42, tensorboard_log: str | None = None, _init_setup_model: bool = True, **kwargs: Any)[source]

The NSGA2 class implements the non-dominated sorting genetic algorithm 2

The agent can work with discrete event systems and with continuous or mixed integer problems. Alternatively a mixture of the above may be specified.

The action space can specify both events and variables using spaces.Dict in the form:

action_space= spaces.Dict({'events': spaces.Discrete(15),
                           'variables': spaces.MultiDiscrete([15]*3)})

This specifies 15 events and an additional 3 variables. The variables will be integers and have an upper boundary value of 15. Other spaces (except Tuple and Dict) can be defined for the variables. Events only takes the Discrete space as an input.

When events is specified, a list will be returned with ordered values, that should achieve a near optimal reward. For variables the values will be adjusted to achieve the highest reward. Upper and lower boundaries as well as types will be inferred from the space.

Note

This agent does not use the observation space. Instead it only relies on rewards returned by the environment. Returned rewards can be tuples, if multi-objective optimization is required. Existing Environments do not have to be adjusted, however. The agent will also accept standard rewards and will ignore any observation spaces.

Note

The number of environments must be equal to the population for this agent because it needs one environment for the evaluation of every solution. This allows for solutions to be evaluated in parallel.

Parameters:

policy – Agent policy. Parameter is not used in this agent.
env – Environment to be optimized.
learning_rate – Reduction factor for the crossover and mutation rates (default: 1).
verbose – Logging verbosity.
population – Maximum number of parallel solutions (>= 2).
mutations – Chance for mutations in existing solutions (between 0 and 1).
crossovers – Chance for crossovers between solutions (between 0 and 1).
n_generations – Number of generations to run the algorithm for.
max_cross_len – Maximum number of genes (as a proportion of total elements) to cross over between solutions (between 0 and 1) (default 1).
max_retries – Maximum number of tries to find new values before the algorithm fails and returns. Using the default should usually be fine (default: 10000).
sense – Determine whether the algorithm looks for minimal (“minimize”) or maximal (“maximize”) rewards (default: “minimize”)
tensorboard_log – the log location for tensorboard (if None, no logging).
seed – Seed for the pseudo random generators.
_init_setup_model – Determine whether model should be initialized during setup