eta_utility.eta_x.agents.rule_based module

class eta_utility.eta_x.agents.rule_based.RuleBased(policy: type[BasePolicy], env: VecEnv, verbose: int = 4, _init_setup_model: bool = True, **kwargs: Any)[source]

Bases: BaseAlgorithm, ABC

The rule based agent base class provides the facilities to easily build a complete rule based agent. To achieve this, only the control_rules function must be implemented. It should take an observation from the environment as input and provide actions as an output.

Parameters:

policy – Agent policy. Parameter is not used in this agent and can be set to NoPolicy.
env – Environment to be controlled.
verbose – Logging verbosity.
kwargs – Additional arguments as specified in stable_baselines3.common.base_class.

state: np.ndarray | None: Last / initial State of the agent.

abstract control_rules(observation: ndarray) → ndarray[source]

This function is abstract and should be used to implement control rules which determine actions from the received observations.

Parameters:: observation – Observations as provided by a single, non vectorized environment.
Returns:: Action values, as determined by the control rules.

predict(observation: np.ndarray | dict[str, np.ndarray], state: tuple[np.ndarray, ...] | None = None, episode_start: np.ndarray | None = None, deterministic: bool = True) → tuple[np.ndarray, tuple[np.ndarray, ...] | None][source]

Perform controller operations and return actions. It will take care of vectorization of environments. This will call the control_rules method which should implement the control rules for a single environment.

Parameters:

observation – the input observation.
state – The last states (not used here).
episode_start – The last masks (not used here).
deterministic – Whether to return deterministic actions. This agent always returns deterministic actions.

Returns:

Tuple of the model’s action and the next state (state is typically None in this agent).

classmethod load(path: str | pathlib.Path | io.BufferedIOBase, env: GymEnv | None = None, device: th.device | str = 'auto', custom_objects: dict[str, Any] | None = None, print_system_info: bool = False, force_reset: bool = True, _init_setup_model: bool = False, **kwargs: Any) → RuleBased[source]

Load the model from a zip-file. Warning: load re-creates the model from scratch, it does not update it in-place!

Parameters:

path – path to the file (or a file-like) where to load the agent from.
env – the new environment to run the loaded model on (can be None if you only need prediction from a trained model) has priority over any saved environment.
device – Device on which the code should run..
custom_objects – Dictionary of objects to replace upon loading. If a variable is present in this dictionary as a key, it will not be deserialized and the corresponding item will be used instead. Similar to custom_objects in keras.models.load_model. Useful when you have an object in file that can not be deserialized.
print_system_info – Whether to print system info from the saved model and the current system info (useful to debug loading issues)
force_reset – Force a call to reset() before training to avoid unexpected behavior. See https://github.com/DLR-RM/stable-baselines3/issues/597
kwargs – extra arguments to change the model when loading.

get_parameter_list() → None[source]: Getting tensorflow parameters is not implemented for the rule based agent.

learn(total_timesteps: int, callback: MaybeCallback = None, log_interval: int = 100, tb_log_name: str = 'run', reset_num_timesteps: bool = True, progress_bar: bool = False) → RuleBased[source]

Return a trained model. Learning is not implemented for the rule based agent.

Parameters:

total_timesteps – The total number of samples (env steps) to train on.
callback – Callback(s) called at every step with state of the algorithm.
log_interval – The number of timesteps before logging.
tb_log_name – The name of the run for TensorBoard logging.
reset_num_timesteps – Whether or not to reset the current timestep number (used in logging).
progress_bar – Display a progress bar using tqdm and rich.

Returns:

The trained model.

action_probability(observation: np.ndarray, state: np.ndarray | None = None, mask: np.ndarray | None = None, actions: np.ndarray | None = None, **kwargs: Any) → None[source]: Get the model’s action probability distribution from an observation. This is not implemented for this agent.