eta_utility.eta_x.agents.nsga2 module

class eta_utility.eta_x.agents.nsga2.Nsga2(policy: type[BasePolicy], env: VecEnv, learning_rate: float | Schedule = 1.0, verbose: int = 2, *, population: int = 100, mutations: float = 0.05, crossovers: float = 0.1, n_generations: int = 100, max_cross_len: float = 1, max_retries: int = 100000, sense: str = 'minimize', predict_learn_steps: int = 5, seed: int = 42, tensorboard_log: str | None = None, _init_setup_model: bool = True, **kwargs: Any)[source]

Bases: BaseAlgorithm

The NSGA2 class implements the non-dominated sorting genetic algorithm 2

The agent can work with discrete event systems and with continuous or mixed integer problems. Alternatively a mixture of the above may be specified.

The action space can specify both events and variables using spaces.Dict in the form:

action_space= spaces.Dict({'events': spaces.Discrete(15),
                           'variables': spaces.MultiDiscrete([15]*3)})

This specifies 15 events and an additional 3 variables. The variables will be integers and have an upper boundary value of 15. Other spaces (except Tuple and Dict) can be defined for the variables. Events only takes the Discrete space as an input.

When events is specified, a list will be returned with ordered values, that should achieve a near optimal reward. For variables the values will be adjusted to achieve the highest reward. Upper and lower boundaries as well as types will be inferred from the space.

Note

This agent does not use the observation space. Instead it only relies on rewards returned by the environment. Returned rewards can be tuples, if multi-objective optimization is required. Existing Environments do not have to be adjusted, however. The agent will also accept standard rewards and will ignore any observation spaces.

Note

The number of environments must be equal to the population for this agent because it needs one environment for the evaluation of every solution. This allows for solutions to be evaluated in parallel.

Parameters:
  • policy – Agent policy. Parameter is not used in this agent.

  • env – Environment to be optimized.

  • learning_rate – Reduction factor for the crossover and mutation rates (default: 1).

  • verbose – Logging verbosity.

  • population – Maximum number of parallel solutions (>= 2).

  • mutations – Chance for mutations in existing solutions (between 0 and 1).

  • crossovers – Chance for crossovers between solutions (between 0 and 1).

  • n_generations – Number of generations to run the algorithm for.

  • max_cross_len – Maximum number of genes (as a proportion of total elements) to cross over between solutions (between 0 and 1) (default 1).

  • max_retries – Maximum number of tries to find new values before the algorithm fails and returns. Using the default should usually be fine (default: 10000).

  • sense – Determine whether the algorithm looks for minimal (“minimize”) or maximal (“maximize”) rewards (default: “minimize”)

  • tensorboard_log – the log location for tensorboard (if None, no logging).

  • seed – Seed for the pseudo random generators.

  • _init_setup_model – Determine whether model should be initialized during setup

population: int

Maximum number of parallel solutions (>= 2).

mutations: float

Chance for mutations in existing solutions (between 0 and 1).

crossovers: float

Chance for crossovers between solutions (between 0 and 1).

max_cross_len: float

Maximum number of genes (as a proportion of total elements) to cross over between solutions (between 0 and 1) (default 1).

max_retries: int

Maximum number of tries to find new values before the algorithm fails and returns. Using the default should usually be fine (default: 10000).

sense: str

Sense of the optimization (maximize or minimize).

n_generations: int

Maximum number of generations to run for.

event_params: int

Parameters defining, how the events chromosome is generated. This is determined automatically from the action space.

variable_params: list[_VariableParameters]

Parameters defining how the variables chromosome is generated. This is determined automatically from the action space.

generation_parent: _jlwrapper

Parent generation of solutions.

generation_offspr: _jlwrapper

Offspring generation of solutions.

seen_solutions: int

List of solutions which have been seen before (avoids duplicate evaluation of equivalent solutions.

total_retries: int

Total number of retries needed during evolution to generate unique solutions.

current_minima: np.ndarray

List of current minimal values for all parts of the reward

ep_actions_buffer: deque

Buffer for actions

ep_reward_buffer: deque

Buffer for rewards

training_infos_buffer: dict

Buffer for training infos

predict_learn_steps: int

Number of learning steps for predict function

property last_evaluation_actions: np.ndarray | None
property last_evaluation_rewards: Any | None
property last_evaluation_fronts: list
learn(total_timesteps: int, callback: MaybeCallback = None, log_interval: int = 1, tb_log_name: str = 'run', reset_num_timesteps: bool = True, progress_bar: bool = False) Nsga2[source]

Return a trained model. The environment which the agent is training on should return an info dictionary when a solution is invalid. The info dictionary should contain a ‘valid’ key which is set to false in that case. If there are too many invalid solutions (more than half of the population), the agent will try to re-initialize these solutions until there is a sufficient number of valid solutions.

Parameters:
  • total_timesteps – The total number of samples (env steps) to train on

  • callback – callback(s) called at every step with state of the algorithm.

  • log_interval – The number of timesteps before logging.

  • tb_log_name – the name of the run for TensorBoard logging

  • reset_num_timesteps – whether to reset the current timestep number (used in logging)

  • progress_bar – Parameter to show progress bar, used by stable_baselines (currently unused!)

Returns:

the trained model

set_random_seed(seed: int | None = None) None[source]

Set the seed of the pseudo-random generators (python, numpy, pytorch, gymnasium, julia)

Parameters:

seed – Seed for the pseudo random generators.

classmethod load(path: str | pathlib.Path | io.BufferedIOBase, env: GymEnv | None = None, device: th.device | str = 'auto', custom_objects: dict[str, Any] | None = None, print_system_info: bool = False, force_reset: bool = True, **kwargs: Any) Nsga2[source]

Load the model from a zip-file. Warning: load re-creates the model from scratch, it does not update it in-place! For an in-place load use set_parameters instead.

Parameters:
  • path – path to the file (or a file-like) where to load the agent from

  • env – the new environment to run the loaded model on (can be None if you only need prediction from a trained model) has priority over any saved environment

  • device – Device on which the code should run.

  • custom_objects – Dictionary of objects to replace upon loading. If a variable is present in this dictionary as a key, it will not be deserialized and the corresponding item will be used instead.

  • print_system_info – Whether to print system info from the saved model and the current system info (useful to debug loading issues)

  • force_reset – Force call to reset() before training to avoid unexpected behavior.

  • kwargs – extra arguments to change the model when loading

Returns:

new model instance with loaded parameters

predict(observation: np.ndarray | dict[str, np.ndarray], state: tuple[np.ndarray, ...] | None = None, episode_start: np.ndarray | None = None, deterministic: bool = False) tuple[np.ndarray, tuple[np.ndarray, ...] | None][source]

Predict function return actions from the best solution.

Parameters:
  • observation – Observation from the environment.

  • state – State from the environment. Not relevant here.

  • episode_start – Whether the episode has just started. Not relevant here.

  • deterministic – Whether to use deterministic actions. Not relevant here.

Returns:

actions from the best solution