Reinforcement learning environments

This notebook gathers the functions creating different kinds of environments for foraging and target search in various scenarios, adapted for their use in the reinforcement learning paradigm.

Helpers

isBetween

source

isBetween_c_Vec_numba

 isBetween_c_Vec_numba (a, b, c, r)

Checks whether point c is crossing the line formed with point a and b.

	Type	Details
a	tensor, shape = (1,2)	Previous position.
b	tensor, shape = (1,2)	Current position.
c	tensor, shape = (Nt,2)	Positions of all targets.
r	int/float	Target radius.
Returns	array of boolean values	True at the indices of found targets.

compiling = isBetween_c_Vec_numba(np.array([0.1,1]), np.array([1,3]), np.random.rand(100,2), 0.00001)

4.65 μs ± 25.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

from rl_opts.utils import isBetween_c_Vec as oldbetween

40.4 μs ± 177 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Pareto sampling

source

pareto_sample

 pareto_sample (alpha, xm, size=1)

Random sampling from array with probs

source

rand_choice_nb

 rand_choice_nb (arr, prob)

:param arr: A 1D numpy array of values to sample from. :param prob: A 1D numpy array of probabilities for the given samples. :return: A random sample from the given array with a given probability.

TargetEnv

source

TargetEnv

 TargetEnv (*args, **kwargs)

Class defining the a Foraging environment with multiple targets and two actions: continue in the same direction and turn by a random angle.

ResetEnv

Search loop with fixed policy an arbitrary environment

source

reset_search_loop

 reset_search_loop (T, reset_policy, env)

Loop that runs the reset environment with a given reset policy.

	Details
T	Number of steps
reset_policy	Reset policy
env	Environment

1D

source

ResetEnv_1D

 ResetEnv_1D (*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Parallel search loops for Reset 1D

parallel_Reset1D_exp

 parallel_Reset1D_exp (T, rates, L, D)

Runs the Reset 1D loop in parallel for different exponential resetting rates.

parallel_Reset1D_sharp

 parallel_Reset1D_sharp (T, resets, L, D)

Runs the Reset 1D loop in parallel for different sharp resetting times.

2D

source

ResetEnv_2D

 ResetEnv_2D (*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Parallel search loops for Reset 2D

parallel_Reset2D_policies

 parallel_Reset2D_policies (T, reset_policies, dist_target, radius_target,
                            D)

parallel_Reset2D_exp

 parallel_Reset2D_exp (T, rates, dist_target, radius_target, D)

parallel_Reset2D_sharp

 parallel_Reset2D_sharp (T, resets, dist_target, radius_target, D)

TurnResetEnv

Only 2D is considered

TurnResetEnv_2D

 TurnResetEnv_2D (*args, **kwargs)

Class defining a Foraging environment with a single target and three possible actions:

Continue in the same direction
Turn by a random angle
Reset to the origin

The agent makes steps of constant length given by agent_step.

Search loop with fixed policy

search_loop_turn_reset_sharp

 search_loop_turn_reset_sharp (T, reset, turn, env)

Runs a search loop of T steps. There is a single counter that works as follows:

Starts at 0
For each turn or continue action gets +1
If reset or reach the target is set to 0