RL-OptS

Classic version

  • Get started
  • Documentation
    • RL framework
      • Classic version
      • numba implementation
        • Reinforcement learning environments
        • Reinforcement learning agents
    • Learning and benchmarking
    • Imitation learning
    • Analytical functions
    • Utils
  • Tutorials
    • Reinforcement Learning
    • Benchmarks
    • Imitation learning
    • Learning to reset in target search problems

On this page

  • Environment
    • TargetEnv
  • Projective Simulation agent
    • PSAgent
  • General forager agent
    • Forager

Report an issue

Classic version

This notebook gathers the functions creating the RL framework proposed in our work. Namely, it can be use to generate both the foraging environment as well as the agents moving on them.

Environment

Class that defines the foraging environment

np.random.rand()
0.4958686188975374

source

TargetEnv

 TargetEnv (Nt, L, r, lc, agent_step=1, boundary_condition='periodic',
            num_agents=1, high_den=5, destructive=False)

Class defining the foraging environment. It includes the methods needed to place several agents to the world.

Projective Simulation agent


source

PSAgent

 PSAgent (num_actions, num_percepts_list, gamma_damping=0.0,
          eta_glow_damping=0.0, policy_type='standard', beta_softmax=3,
          initial_prob_distr=None, fixed_policy=None)

Base class of a Reinforcement Learning agent based on Projective Simulation, with two-layered network. This class has been adapted from https://github.com/qic-ibk/projectivesimulation

Type Default Details
num_actions int >=1 Number of actions.
num_percepts_list list of integers >=1, not nested Cardinality of each category/feature of percept space.
gamma_damping float 0.0 Forgetting/damping of h-values at the end of each interaction. The default is 0.0.
eta_glow_damping float 0.0 Controls the damping of glow; setting this to 1 effectively switches off glow. The default is 0.0.
policy_type str standard Toggles the rule used to compute probabilities from h-values. See probability_distr. The default is ‘standard’.
beta_softmax int 3 Probabilities are proportional to exp(beta*h_value). If policy_type != ‘softmax’, then this is irrelevant. The default is 3.
initial_prob_distr NoneType None In case the user wants to change the initialization policy for the agent. This list contains, per percept, a list with the values of the initial h values for each action. The default is None.
fixed_policy NoneType None In case the user wants to fix a policy for the agent. This list contains, per percept, a list with the values of the probabilities for each action.
Example: Percept 0: fixed_policy[0] = [p(a0), p(a1), p(a2)] = [0.2, 0.3, 0.5], where a0, a1 and a2 are the three possible actions. The default is None.

General forager agent


source

Forager

 Forager (state_space, num_actions, visual_cone=3.141592653589793,
          visual_radius=1.0, **kwargs)

This class extends the general PSAgent class and adapts it to the foraging scenario·

Type Default Details
state_space list List where each entry is the state space of each perceptual feature.
E.g. [state space of step counter, state space of density of successful neighbours].
num_actions int Number of actions.
visual_cone float 3.141592653589793 Visual cone (angle, in radians) of the forager, useful in scenarios with ensembles of agents. The default is np.pi.
visual_radius float 1.0 Radius of the visual region, useful in scenarious with ensembles of agents. The default is 1.0.
kwargs