Reinforcement learning agents
This notebook gathers the functions creating different kinds of agents for foraging and target search in various scenarios, adapted for their use in the reinforcement learning paradigm.
Helpers
Random sampling from array with probs
rand_choice_nb
rand_choice_nb (arr, prob)
:param arr: A 1D numpy array of values to sample from. :param prob: A 1D numpy array of probabilities for the given samples. :return: A random sample from the given array with a given probability.
Forager
Forager
Forager (*args, **kwargs)
*This class defines a Forager agent, able to perform actions and learn from rewards based on the PS paradigm.
This is an updated version from the one used in the original paper (https://doi.org/10.1088/1367-2630/ad19a8), taking into account the improvements made to the H and G matrices proposed by Michele Caraglio in our paper (https://doi.org/10.1039/D3SM01680C).*
Parallel training launchers
For ResetEnv
Search loop
train_loop_reset
train_loop_reset (episodes, time_ep, agent, env, h_mat_allT=False, when_save_h_mat=1, reset_after_reward=True)
Launchers
Note: we have to separate the launchers in 1D and 2D because of
numba
compilation, which would give errors due to the enviroments asking for different inputs.
1D
run_agents_reset_1D
run_agents_reset_1D (episodes, time_ep, N_agents, D=0.5, L=10.0, num_actions=2, size_state_space=array([100]), gamma_damping=1e-05, eta_glow_damping=0.1, g_update='s', initial_prob_distr=array([], shape=(2, 0), dtype=float64), policy_type='standard', beta_softmax=3, fixed_policy=array([], shape=(2, 0), dtype=float64), max_no_H_update=1000, h_mat_allT=False, reset_after_reward=True, num_runs=None)
Type | Default | Details | |
---|---|---|---|
episodes | |||
time_ep | |||
N_agents | |||
D | float | 0.5 | |
L | float | 10.0 | Environment props |
num_actions | int | 2 | Agent props |
size_state_space | ndarray | [100] | |
gamma_damping | float | 1e-05 | |
eta_glow_damping | float | 0.1 | |
g_update | str | s | |
initial_prob_distr | [] | ||
policy_type | str | standard | |
beta_softmax | int | 3 | |
fixed_policy | [] | ||
max_no_H_update | int | 1000 | |
h_mat_allT | bool | False | |
reset_after_reward | bool | True | |
num_runs | NoneType | None | When we want N_agent != number of max cores, we use this to make few runs over the selected number of cores, given by N_agents. |
2D
run_agents_reset_2D
run_agents_reset_2D (episodes, time_ep, N_agents, dist_target=10.0, radius_target=1.0, D=0.5, num_actions=2, size_state_space=array([100]), gamma_damping=1e-05, eta_glow_damping=0.1, initial_prob_distr=array([], shape=(2, 0), dtype=float64), policy_type='standard', beta_softmax=3, fixed_policy=array([], shape=(2, 0), dtype=float64), max_no_H_update=1000, h_mat_allT=False, when_save_h_mat=1, reset_after_reward=True, g_update='s', num_runs=None)
Type | Default | Details | |
---|---|---|---|
episodes | |||
time_ep | |||
N_agents | |||
dist_target | float | 10.0 | |
radius_target | float | 1.0 | |
D | float | 0.5 | Environment props |
num_actions | int | 2 | Agent props |
size_state_space | ndarray | [100] | |
gamma_damping | float | 1e-05 | |
eta_glow_damping | float | 0.1 | |
initial_prob_distr | [] | ||
policy_type | str | standard | |
beta_softmax | int | 3 | |
fixed_policy | [] | ||
max_no_H_update | int | 1000 | |
h_mat_allT | bool | False | |
when_save_h_mat | int | 1 | |
reset_after_reward | bool | True | |
g_update | str | s | |
num_runs | NoneType | None | When we want N_agent != number of max cores, we use this to make few runs over the selected number of cores, given by N_agents. |