eta_utility.eta_x.agents.nsga2 module
- class eta_utility.eta_x.agents.nsga2.Nsga2(policy: type[BasePolicy], env: VecEnv, learning_rate: float | Schedule = 1.0, verbose: int = 2, *, population: int = 100, mutations: float = 0.05, crossovers: float = 0.1, n_generations: int = 100, max_cross_len: float = 1, max_retries: int = 100000, sense: str = 'minimize', predict_learn_steps: int = 5, seed: int = 42, tensorboard_log: str | None = None, _init_setup_model: bool = True, **kwargs: Any)[source]
Bases:
BaseAlgorithm
The NSGA2 class implements the non-dominated sorting genetic algorithm 2
The agent can work with discrete event systems and with continuous or mixed integer problems. Alternatively a mixture of the above may be specified.
The action space can specify both events and variables using spaces.Dict in the form:
action_space= spaces.Dict({'events': spaces.Discrete(15), 'variables': spaces.MultiDiscrete([15]*3)})
This specifies 15 events and an additional 3 variables. The variables will be integers and have an upper boundary value of 15. Other spaces (except Tuple and Dict) can be defined for the variables. Events only takes the Discrete space as an input.
When events is specified, a list will be returned with ordered values, that should achieve a near optimal reward. For variables the values will be adjusted to achieve the highest reward. Upper and lower boundaries as well as types will be inferred from the space.
Note
This agent does not use the observation space. Instead it only relies on rewards returned by the environment. Returned rewards can be tuples, if multi-objective optimization is required. Existing Environments do not have to be adjusted, however. The agent will also accept standard rewards and will ignore any observation spaces.
Note
The number of environments must be equal to the population for this agent because it needs one environment for the evaluation of every solution. This allows for solutions to be evaluated in parallel.
- Parameters:
policy – Agent policy. Parameter is not used in this agent.
env – Environment to be optimized.
learning_rate – Reduction factor for the crossover and mutation rates (default: 1).
verbose – Logging verbosity.
population – Maximum number of parallel solutions (>= 2).
mutations – Chance for mutations in existing solutions (between 0 and 1).
crossovers – Chance for crossovers between solutions (between 0 and 1).
n_generations – Number of generations to run the algorithm for.
max_cross_len – Maximum number of genes (as a proportion of total elements) to cross over between solutions (between 0 and 1) (default 1).
max_retries – Maximum number of tries to find new values before the algorithm fails and returns. Using the default should usually be fine (default: 10000).
sense – Determine whether the algorithm looks for minimal (“minimize”) or maximal (“maximize”) rewards (default: “minimize”)
tensorboard_log – the log location for tensorboard (if None, no logging).
seed – Seed for the pseudo random generators.
_init_setup_model – Determine whether model should be initialized during setup
- max_cross_len: float
Maximum number of genes (as a proportion of total elements) to cross over between solutions (between 0 and 1) (default 1).
- max_retries: int
Maximum number of tries to find new values before the algorithm fails and returns. Using the default should usually be fine (default: 10000).
- event_params: int
Parameters defining, how the events chromosome is generated. This is determined automatically from the action space.
- variable_params: list[_VariableParameters]
Parameters defining how the variables chromosome is generated. This is determined automatically from the action space.
- generation_parent: _jlwrapper
Parent generation of solutions.
- generation_offspr: _jlwrapper
Offspring generation of solutions.
- seen_solutions: int
List of solutions which have been seen before (avoids duplicate evaluation of equivalent solutions.
- current_minima: np.ndarray
List of current minimal values for all parts of the reward
- ep_actions_buffer: deque
Buffer for actions
- ep_reward_buffer: deque
Buffer for rewards
- learn(total_timesteps: int, callback: MaybeCallback = None, log_interval: int = 1, tb_log_name: str = 'run', reset_num_timesteps: bool = True, progress_bar: bool = False) Nsga2 [source]
Return a trained model. The environment which the agent is training on should return an info dictionary when a solution is invalid. The info dictionary should contain a ‘valid’ key which is set to false in that case. If there are too many invalid solutions (more than half of the population), the agent will try to re-initialize these solutions until there is a sufficient number of valid solutions.
- Parameters:
total_timesteps – The total number of samples (env steps) to train on
callback – callback(s) called at every step with state of the algorithm.
log_interval – The number of timesteps before logging.
tb_log_name – the name of the run for TensorBoard logging
reset_num_timesteps – whether to reset the current timestep number (used in logging)
progress_bar – Parameter to show progress bar, used by stable_baselines (currently unused!)
- Returns:
the trained model
- set_random_seed(seed: int | None = None) None [source]
Set the seed of the pseudo-random generators (python, numpy, pytorch, gymnasium, julia)
- Parameters:
seed – Seed for the pseudo random generators.
- classmethod load(path: str | pathlib.Path | io.BufferedIOBase, env: GymEnv | None = None, device: th.device | str = 'auto', custom_objects: dict[str, Any] | None = None, print_system_info: bool = False, force_reset: bool = True, **kwargs: Any) Nsga2 [source]
Load the model from a zip-file. Warning:
load
re-creates the model from scratch, it does not update it in-place! For an in-place load useset_parameters
instead.- Parameters:
path – path to the file (or a file-like) where to load the agent from
env – the new environment to run the loaded model on (can be None if you only need prediction from a trained model) has priority over any saved environment
device – Device on which the code should run.
custom_objects – Dictionary of objects to replace upon loading. If a variable is present in this dictionary as a key, it will not be deserialized and the corresponding item will be used instead.
print_system_info – Whether to print system info from the saved model and the current system info (useful to debug loading issues)
force_reset – Force call to
reset()
before training to avoid unexpected behavior.kwargs – extra arguments to change the model when loading
- Returns:
new model instance with loaded parameters
- predict(observation: np.ndarray | dict[str, np.ndarray], state: tuple[np.ndarray, ...] | None = None, episode_start: np.ndarray | None = None, deterministic: bool = False) tuple[np.ndarray, tuple[np.ndarray, ...] | None] [source]
Predict function return actions from the best solution.
- Parameters:
observation – Observation from the environment.
state – State from the environment. Not relevant here.
episode_start – Whether the episode has just started. Not relevant here.
deterministic – Whether to use deterministic actions. Not relevant here.
- Returns:
actions from the best solution