Reinforcement learning agents

This notebook gathers the functions creating different kinds of agents for foraging and target search in various scenarios, adapted for their use in the reinforcement learning paradigm.

Helpers

Random sampling from array with probs

source

rand_choice_nb

 rand_choice_nb (arr, prob)

:param arr: A 1D numpy array of values to sample from. :param prob: A 1D numpy array of probabilities for the given samples. :return: A random sample from the given array with a given probability.

Forager

source

Forager

 Forager (*args, **kwargs)

*This class defines a Forager agent, able to perform actions and learn from rewards based on the PS paradigm.

This is an updated version from the one used in the original paper (https://doi.org/10.1088/1367-2630/ad19a8), taking into account the improvements made to the H and G matrices proposed by Michele Caraglio in our paper (https://doi.org/10.1039/D3SM01680C).*

Parallel training launchers

For ResetEnv

Search loop

source

train_loop_reset

 train_loop_reset (episodes, time_ep, agent, env, h_mat_allT=False,
                   when_save_h_mat=1, reset_after_reward=True)

Launchers

Note: we have to separate the launchers in 1D and 2D because of numba compilation, which would give errors due to the enviroments asking for different inputs.

1D

source

run_agents_reset_1D

 run_agents_reset_1D (episodes, time_ep, N_agents, D=0.5, L=10.0,
                      num_actions=2, size_state_space=array([100]),
                      gamma_damping=1e-05, eta_glow_damping=0.1,
                      g_update='s', initial_prob_distr=array([], shape=(2,
                      0), dtype=float64), policy_type='standard',
                      beta_softmax=3, fixed_policy=array([], shape=(2, 0),
                      dtype=float64), max_no_H_update=1000,
                      h_mat_allT=False, reset_after_reward=True,
                      num_runs=None)

	Type	Default	Details
episodes
time_ep
N_agents
D	float	0.5
L	float	10.0	Environment props
num_actions	int	2	Agent props
size_state_space	ndarray	[100]
gamma_damping	float	1e-05
eta_glow_damping	float	0.1
g_update	str	s
initial_prob_distr		[]
policy_type	str	standard
beta_softmax	int	3
fixed_policy		[]
max_no_H_update	int	1000
h_mat_allT	bool	False
reset_after_reward	bool	True
num_runs	NoneType	None	When we want N_agent != number of max cores, we use this to make few runs over the selected number of cores, given by N_agents.

2D

source

run_agents_reset_2D

 run_agents_reset_2D (episodes, time_ep, N_agents, dist_target=10.0,
                      radius_target=1.0, D=0.5, num_actions=2,
                      size_state_space=array([100]), gamma_damping=1e-05,
                      eta_glow_damping=0.1, initial_prob_distr=array([],
                      shape=(2, 0), dtype=float64),
                      policy_type='standard', beta_softmax=3,
                      fixed_policy=array([], shape=(2, 0), dtype=float64),
                      max_no_H_update=1000, h_mat_allT=False,
                      when_save_h_mat=1, reset_after_reward=True,
                      g_update='s', num_runs=None)

	Type	Default	Details
episodes
time_ep
N_agents
dist_target	float	10.0
radius_target	float	1.0
D	float	0.5	Environment props
num_actions	int	2	Agent props
size_state_space	ndarray	[100]
gamma_damping	float	1e-05
eta_glow_damping	float	0.1
initial_prob_distr		[]
policy_type	str	standard
beta_softmax	int	3
fixed_policy		[]
max_no_H_update	int	1000
h_mat_allT	bool	False
when_save_h_mat	int	1
reset_after_reward	bool	True
g_update	str	s
num_runs	NoneType	None	When we want N_agent != number of max cores, we use this to make few runs over the selected number of cores, given by N_agents.

Helpers

Random sampling from array with probs

rand_choice_nb

Forager

Forager

Parallel training launchers

For ResetEnv

Search loop

train_loop_reset

Launchers

1D

run_agents_reset_1D

2D

run_agents_reset_2D

nbdev