RL-OptS

Reinforcement learning environments

  • Get started
  • Documentation
    • RL framework
      • Classic version
      • numba implementation
        • Reinforcement learning environments
        • Reinforcement learning agents
    • Learning and benchmarking
    • Imitation learning
    • Analytical functions
    • Utils
  • Tutorials
    • Reinforcement Learning
    • Benchmarks
    • Imitation learning
    • Learning to reset in target search problems

On this page

  • Helpers
    • isBetween
      • isBetween_c_Vec_numba
    • Pareto sampling
      • pareto_sample
    • Random sampling from array with probs
      • rand_choice_nb
  • TargetEnv
    • TargetEnv
  • ResetEnv
    • Search loop with fixed policy an arbitrary environment
    • reset_search_loop
    • 1D
      • ResetEnv_1D
      • Parallel search loops for Reset 1D
      • parallel_Reset1D_exp
      • parallel_Reset1D_sharp
    • 2D
      • ResetEnv_2D
    • Parallel search loops for Reset 2D
      • parallel_Reset2D_policies
      • parallel_Reset2D_exp
      • parallel_Reset2D_sharp
  • TurnResetEnv
    • TurnResetEnv_2D
    • Search loop with fixed policy
      • search_loop_turn_reset_sharp

Report an issue

Reinforcement learning environments

This notebook gathers the functions creating different kinds of environments for foraging and target search in various scenarios, adapted for their use in the reinforcement learning paradigm.

Helpers

isBetween


source

isBetween_c_Vec_numba

 isBetween_c_Vec_numba (a, b, c, r)

Checks whether point c is crossing the line formed with point a and b.

Type Details
a tensor, shape = (1,2) Previous position.
b tensor, shape = (1,2) Current position.
c tensor, shape = (Nt,2) Positions of all targets.
r int/float Target radius.
Returns array of boolean values True at the indices of found targets.
compiling = isBetween_c_Vec_numba(np.array([0.1,1]), np.array([1,3]), np.random.rand(100,2), 0.00001)
4.65 μs ± 25.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
from rl_opts.utils import isBetween_c_Vec as oldbetween
40.4 μs ± 177 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Pareto sampling


source

pareto_sample

 pareto_sample (alpha, xm, size=1)

Random sampling from array with probs


source

rand_choice_nb

 rand_choice_nb (arr, prob)

:param arr: A 1D numpy array of values to sample from. :param prob: A 1D numpy array of probabilities for the given samples. :return: A random sample from the given array with a given probability.

TargetEnv


source

TargetEnv

 TargetEnv (*args, **kwargs)

Class defining the a Foraging environment with multiple targets and two actions: continue in the same direction and turn by a random angle.

ResetEnv

Search loop with fixed policy an arbitrary environment


source

reset_search_loop

 reset_search_loop (T, reset_policy, env)

Loop that runs the reset environment with a given reset policy.

Details
T Number of steps
reset_policy Reset policy
env Environment

1D


source

ResetEnv_1D

 ResetEnv_1D (*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Parallel search loops for Reset 1D


parallel_Reset1D_exp

 parallel_Reset1D_exp (T, rates, L, D)

Runs the Reset 1D loop in parallel for different exponential resetting rates.


parallel_Reset1D_sharp

 parallel_Reset1D_sharp (T, resets, L, D)

Runs the Reset 1D loop in parallel for different sharp resetting times.

2D


source

ResetEnv_2D

 ResetEnv_2D (*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Parallel search loops for Reset 2D


parallel_Reset2D_policies

 parallel_Reset2D_policies (T, reset_policies, dist_target, radius_target,
                            D)

parallel_Reset2D_exp

 parallel_Reset2D_exp (T, rates, dist_target, radius_target, D)

parallel_Reset2D_sharp

 parallel_Reset2D_sharp (T, resets, dist_target, radius_target, D)

TurnResetEnv

Only 2D is considered


TurnResetEnv_2D

 TurnResetEnv_2D (*args, **kwargs)

Class defining a Foraging environment with a single target and three possible actions:

  • Continue in the same direction
  • Turn by a random angle
  • Reset to the origin

The agent makes steps of constant length given by agent_step.

Search loop with fixed policy


search_loop_turn_reset_sharp

 search_loop_turn_reset_sharp (T, reset, turn, env)

Runs a search loop of T steps. There is a single counter that works as follows:

  • Starts at 0
  • For each turn or continue action gets +1
  • If reset or reach the target is set to 0