RL-OptS

Reinforcement learning agents

  • Get started
  • Documentation
    • RL framework
      • Classic version
      • numba implementation
        • Reinforcement learning environments
        • Reinforcement learning agents
    • Learning and benchmarking
    • Imitation learning
    • Analytical functions
    • Utils
  • Tutorials
    • Reinforcement Learning
    • Benchmarks
    • Imitation learning
    • Learning to reset in target search problems

On this page

  • Helpers
    • Random sampling from array with probs
      • rand_choice_nb
  • Forager
    • Forager
  • Parallel training launchers
    • For ResetEnv
      • Search loop
      • train_loop_reset
      • Launchers
      • run_agents_reset_1D
      • run_agents_reset_2D
  • nbdev

Report an issue

Reinforcement learning agents

This notebook gathers the functions creating different kinds of agents for foraging and target search in various scenarios, adapted for their use in the reinforcement learning paradigm.

Helpers

Random sampling from array with probs


source

rand_choice_nb

 rand_choice_nb (arr, prob)

:param arr: A 1D numpy array of values to sample from. :param prob: A 1D numpy array of probabilities for the given samples. :return: A random sample from the given array with a given probability.

Forager


source

Forager

 Forager (*args, **kwargs)

*This class defines a Forager agent, able to perform actions and learn from rewards based on the PS paradigm.

This is an updated version from the one used in the original paper (https://doi.org/10.1088/1367-2630/ad19a8), taking into account the improvements made to the H and G matrices proposed by Michele Caraglio in our paper (https://doi.org/10.1039/D3SM01680C).*

Parallel training launchers

For ResetEnv

Search loop


source

train_loop_reset

 train_loop_reset (episodes, time_ep, agent, env, h_mat_allT=False,
                   when_save_h_mat=1, reset_after_reward=True)

Launchers

Note: we have to separate the launchers in 1D and 2D because of numba compilation, which would give errors due to the enviroments asking for different inputs.

1D


source

run_agents_reset_1D

 run_agents_reset_1D (episodes, time_ep, N_agents, D=0.5, L=10.0,
                      num_actions=2, size_state_space=array([100]),
                      gamma_damping=1e-05, eta_glow_damping=0.1,
                      g_update='s', initial_prob_distr=array([], shape=(2,
                      0), dtype=float64), policy_type='standard',
                      beta_softmax=3, fixed_policy=array([], shape=(2, 0),
                      dtype=float64), max_no_H_update=1000,
                      h_mat_allT=False, reset_after_reward=True,
                      num_runs=None)
Type Default Details
episodes
time_ep
N_agents
D float 0.5
L float 10.0 Environment props
num_actions int 2 Agent props
size_state_space ndarray [100]
gamma_damping float 1e-05
eta_glow_damping float 0.1
g_update str s
initial_prob_distr []
policy_type str standard
beta_softmax int 3
fixed_policy []
max_no_H_update int 1000
h_mat_allT bool False
reset_after_reward bool True
num_runs NoneType None When we want N_agent != number of max cores, we use this to make few runs
over the selected number of cores, given by N_agents.

2D


source

run_agents_reset_2D

 run_agents_reset_2D (episodes, time_ep, N_agents, dist_target=10.0,
                      radius_target=1.0, D=0.5, num_actions=2,
                      size_state_space=array([100]), gamma_damping=1e-05,
                      eta_glow_damping=0.1, initial_prob_distr=array([],
                      shape=(2, 0), dtype=float64),
                      policy_type='standard', beta_softmax=3,
                      fixed_policy=array([], shape=(2, 0), dtype=float64),
                      max_no_H_update=1000, h_mat_allT=False,
                      when_save_h_mat=1, reset_after_reward=True,
                      g_update='s', num_runs=None)
Type Default Details
episodes
time_ep
N_agents
dist_target float 10.0
radius_target float 1.0
D float 0.5 Environment props
num_actions int 2 Agent props
size_state_space ndarray [100]
gamma_damping float 1e-05
eta_glow_damping float 0.1
initial_prob_distr []
policy_type str standard
beta_softmax int 3
fixed_policy []
max_no_H_update int 1000
h_mat_allT bool False
when_save_h_mat int 1
reset_after_reward bool True
g_update str s
num_runs NoneType None When we want N_agent != number of max cores, we use this to make few runs
over the selected number of cores, given by N_agents.

nbdev