np.random.rand()
0.4958686188975374
This notebook gathers the functions creating the RL framework proposed in our work. Namely, it can be use to generate both the foraging environment as well as the agents moving on them.
Class that defines the foraging environment
TargetEnv (Nt, L, r, lc, agent_step=1, boundary_condition='periodic', num_agents=1, high_den=5, destructive=False)
Class defining the foraging environment. It includes the methods needed to place several agents to the world.
PSAgent (num_actions, num_percepts_list, gamma_damping=0.0, eta_glow_damping=0.0, policy_type='standard', beta_softmax=3, initial_prob_distr=None, fixed_policy=None)
Base class of a Reinforcement Learning agent based on Projective Simulation, with two-layered network. This class has been adapted from https://github.com/qic-ibk/projectivesimulation
Type | Default | Details | |
---|---|---|---|
num_actions | int >=1 | Number of actions. | |
num_percepts_list | list of integers >=1, not nested | Cardinality of each category/feature of percept space. | |
gamma_damping | float | 0.0 | Forgetting/damping of h-values at the end of each interaction. The default is 0.0. |
eta_glow_damping | float | 0.0 | Controls the damping of glow; setting this to 1 effectively switches off glow. The default is 0.0. |
policy_type | str | standard | Toggles the rule used to compute probabilities from h-values. See probability_distr. The default is ‘standard’. |
beta_softmax | int | 3 | Probabilities are proportional to exp(beta*h_value). If policy_type != ‘softmax’, then this is irrelevant. The default is 3. |
initial_prob_distr | NoneType | None | In case the user wants to change the initialization policy for the agent. This list contains, per percept, a list with the values of the initial h values for each action. The default is None. |
fixed_policy | NoneType | None | In case the user wants to fix a policy for the agent. This list contains, per percept, a list with the values of the probabilities for each action. Example: Percept 0: fixed_policy[0] = [p(a0), p(a1), p(a2)] = [0.2, 0.3, 0.5], where a0, a1 and a2 are the three possible actions. The default is None. |
Forager (state_space, num_actions, visual_cone=3.141592653589793, visual_radius=1.0, **kwargs)
This class extends the general PSAgent
class and adapts it to the foraging scenario·
Type | Default | Details | |
---|---|---|---|
state_space | list | List where each entry is the state space of each perceptual feature. E.g. [state space of step counter, state space of density of successful neighbours]. |
|
num_actions | int | Number of actions. | |
visual_cone | float | 3.141592653589793 | Visual cone (angle, in radians) of the forager, useful in scenarios with ensembles of agents. The default is np.pi. |
visual_radius | float | 1.0 | Radius of the visual region, useful in scenarious with ensembles of agents. The default is 1.0. |
kwargs |