Extensions for stable_baselines3

eta_x implements some extensions for stable_baselines3 such as additional feature extractors, policies and schedules. More information about the prior two can be found in the stable_baselines3 documentation for custom policy networks..

In short, stable_baselines3 divides the policy network into two main parts:

A feature extractor which can handle different types of inputs (apart from images).

A (fully-connected) network that maps the features to actions and values. (controlled by the net_arch parameter)

Policies

Some of the agents defined in eta_x do not require the specification of a policy. For this special case you can use the NoPolicy class which just does nothing… NoPolicy inherits from stable_baselines3.common.policies.BasePolicy.

class eta_utility.eta_x.common.NoPolicy(*args, squash_output: bool = False, **kwargs)[source]: No Policy allows for the creation of agents which do not use neural networks. It does not implement any of the typical policy functions but is a simple interface that can be used and ignored. There is no need to worry about the implementation details of policies.

Schedules

Schedules evolve over time throughout the learning process of RL applications. eta_x implements a BaseSchedule class which enables the creation of new schedules by inheriting from it and implementing a custom value function which returns the output value, base on an input between 0 and 1. See learning rate schedules in stable_baselines3.

The Schedule object is callable so you can pass it directly as a schedule function.

The linear schedule implements a linear evolution of the learning rate.

class eta_utility.eta_x.common.LinearSchedule(initial_p: float, final_p: float)[source]

Linear interpolation schedule adjusts the learning rate between initial_p and final_p. The value is calculated based on the remaining progress, which is between 1 (start) and 0 (end).

Parameters:

initial_p – Initial output value.
final_p – Final output value.

Usage:

schedule = LinearSchedule(0.9, 0.2)
schedule(0.5) == 0.55  # True

Extractors

Extractors are based on stable_baselines3.common.base_class.BaseFeaturesExtractor. Use of a custom extractor is specified as a configuration option of the Policy. It is specified in the agent_specific section of the configuration as part of the policy_kwargs dictionary. The required parameters are features_extractor_class which must contain the Python class and features_extractor_kwargs which is a Mapping of the arguments passed to the feature extractor.

class eta_utility.eta_x.common.CustomExtractor(observation_space: gymnasium.Space, *, net_arch: Sequence[Mapping[str, Any]], device: th.device | str = 'auto')[source]

Advanced feature extractor which allows the definition of arbitrary network structures. Layers can be any of the layers defined in torch.nn. The net_arch parameter will be interpreted by the function eta_utility.eta_x.common.common.deserialize_net_arch().

Parameters:

observation_space – gymnasium space.
net_arch – The architecture of the Advanced Feature Extractor. See eta_utility.eta_x.common.deserialize_net_arch() for syntax.
device – Torch device for training.

The architecture of the feature extractor is controlled by the net_arch parameter. It is able to handle observations which consist of classic, time-independent data and multiple time series. See below for an explanation of how this parameter is interpreted using eta_utility.eta_x.common.common.deserialize_net_arch().

Warning

The user must ensure the correct order of observations since the network architecture is often highly dependent on the type of observations passed into the extractor.

Configuring custom neural network architectures

The following function can be used to configure custom neural network architectures in the configuration used by eta_x. This function is used by custom extractors to interpret the net_arch parameter.

eta_utility.eta_x.common.deserialize_net_arch(net_arch: Sequence[Mapping[str, Any]], in_features: int, device: th.device | str = 'auto') → th.nn.Sequential[source]

Deserialize_net_arch can take a list of dictionaries describing a sequential torch network and deserialize it by instatiating the corresponding classes.

An example for a possible net_arch would be:

[{"layer": "Linear", "out_features": 60},
 {"activation_func": "Tanh"},
 {"layer": "Linear", "out_features": 60},
 {"activation_func": "Tanh"}]

One key of the dictionary should be either ‘layer’, ‘activation_func’ or ‘process’. If the ‘layer’ key is present, a layer from the torch.nn module is instantiated, if the ‘activation_func’ key is present, the value will be instantiated as an activation function from torch.nn. If the key ‘process’ is present, the value will be interpreted as a data processor from eta_utility.eta_x.common.processors.

All other keys of each dictionary will be used as keyword parameters to the instantiation of the layer, activation function or processor.

Only the number of input features for the first layer must be specified (using the ‘in_features’) parameter. The function will then automatically determine the number of input features for all other layers in the sequential network.

Parameters:

net_arch – List of dictionaries describing the network architecture.
in_features – Number of input features for the first layer.
device – Torch device to use for training the network.

Returns:

Sequential torch network.

Data Processors

The network architectures used for the CustomExtractor can be extended with the data processors provided py the processors module.

class eta_utility.eta_x.common.Split1d(in_features: int, sizes: Sequence[None | int], net_arch: Sequence[th.nn.Module])[source]

Split1d defines a pytorch module which splits the 1D input tensor into multiple parts and passes each of the parts through a separate network. After the pass through the network, the output from all networks is joined together. Thus, Split1d will return a 1d observation vector.

When configuring the network architecture, it is important to ensure that the output of all networks is 1D. Use torch.nn.Flatten to flatten the output of networks where the output is not one dimensional.

Use the parameters ‘sizes’ and ‘net_arch’ to determine how many of the input features should be passed through which network. Each value in sizes must have a correstponding value in net_arch. For the following examples, let’s assume that ‘in_features’ is 15. If ‘sizes’ is [3, 10, None], a valid configuration for net_arch could be [th.nn.Linear(out_features=10), th.nn.Conv1d(out_channels:2), th.nn.Linear(out_features=2)]. The last value of ‘sizes’ will automatically be calculated to be 2 (15 - 3 - 10 = 2). With this, 3 values would be passed to the first Linear layer, 10 values would be passed to the “Conv1d” layer and the final 2 values would be passed to the third layer in net_arch (which is the Linear layer with 2 output features).

If you would like to use dictionaries to configure the net_arch, you can use the function eta_utility.eta_x.common.common.deserialize_net_arch() to create the torch network architecture.

Parameters:

in_features – Number of input features for the Module
sizes – List of sizes for splitting the input features. This list can contain the value “None” once. If the list contains None, this will be evaluated to contain all remaining input features.
net_arch – List of torch.nn Modules. Each value of this list corresponds to one value of the ‘sizes’ list.

class eta_utility.eta_x.common.Fold1d(out_channels: int)[source]

Fold a 1D tensor to create a multi-dimensional tensor. The parameter ‘out_channels’ determines, how many dimensions the output tensor will have.

Parameters:: out_channels – Number of dimensions of the output tensor.