Classic version

This notebook gathers the functions creating the RL framework proposed in our work. Namely, it can be use to generate both the foraging environment as well as the agents moving on them.

Environment

Class that defines the foraging environment

np.random.rand()

0.4958686188975374

source

TargetEnv

 TargetEnv (Nt, L, r, lc, agent_step=1, boundary_condition='periodic',
            num_agents=1, high_den=5, destructive=False)

Class defining the foraging environment. It includes the methods needed to place several agents to the world.

Projective Simulation agent

source

PSAgent

 PSAgent (num_actions, num_percepts_list, gamma_damping=0.0,
          eta_glow_damping=0.0, policy_type='standard', beta_softmax=3,
          initial_prob_distr=None, fixed_policy=None)

Base class of a Reinforcement Learning agent based on Projective Simulation, with two-layered network. This class has been adapted from https://github.com/qic-ibk/projectivesimulation

	Type	Default	Details
num_actions	int >=1		Number of actions.
num_percepts_list	list of integers >=1, not nested		Cardinality of each category/feature of percept space.
gamma_damping	float	0.0	Forgetting/damping of h-values at the end of each interaction. The default is 0.0.
eta_glow_damping	float	0.0	Controls the damping of glow; setting this to 1 effectively switches off glow. The default is 0.0.
policy_type	str	standard	Toggles the rule used to compute probabilities from h-values. See probability_distr. The default is ‘standard’.
beta_softmax	int	3	Probabilities are proportional to exp(beta*h_value). If policy_type != ‘softmax’, then this is irrelevant. The default is 3.
initial_prob_distr	NoneType	None	In case the user wants to change the initialization policy for the agent. This list contains, per percept, a list with the values of the initial h values for each action. The default is None.
fixed_policy	NoneType	None	In case the user wants to fix a policy for the agent. This list contains, per percept, a list with the values of the probabilities for each action. Example: Percept 0: fixed_policy[0] = [p(a0), p(a1), p(a2)] = [0.2, 0.3, 0.5], where a0, a1 and a2 are the three possible actions. The default is None.

General forager agent

source

Forager

 Forager (state_space, num_actions, visual_cone=3.141592653589793,
          visual_radius=1.0, **kwargs)

This class extends the general PSAgent class and adapts it to the foraging scenario·

	Type	Default	Details
state_space	list		List where each entry is the state space of each perceptual feature. E.g. [state space of step counter, state space of density of successful neighbours].
num_actions	int		Number of actions.
visual_cone	float	3.141592653589793	Visual cone (angle, in radians) of the forager, useful in scenarios with ensembles of agents. The default is np.pi.
visual_radius	float	1.0	Radius of the visual region, useful in scenarious with ensembles of agents. The default is 1.0.
kwargs