RL-OptS

Learning and benchmarking

  • Get started
  • Documentation
    • RL framework
      • Classic version
      • numba implementation
        • Reinforcement learning environments
        • Reinforcement learning agents
    • Learning and benchmarking
    • Imitation learning
    • Analytical functions
    • Utils
  • Tutorials
    • Reinforcement Learning
    • Benchmarks
    • Imitation learning
    • Learning to reset in target search problems

On this page

  • Learning
    • learning
  • Generate walk from a policy
    • walk_from_policy
  • Efficiency computation
    • agent_efficiency
  • Benchmarks
    • average_search_efficiency
    • Example

Report an issue

Learning and benchmarking

This notebook gathers the functions needed to train agents to forage optimally, as well as tools to calculate their foraging efficiency as well as their comparison to benchmark foraging strategies.

Learning


source

learning

 learning (config, results_path, run)

Training of the RL agent

Type Details
config dict Dictionary with all the parameters
results_path str Path to save the results
run int Agent identifier

Generate walk from a policy


source

walk_from_policy

 walk_from_policy (policy, time_ep, n, L, Nt, r, lc, destructive=False,
                   with_bound=False, bound=100)

Walk of foragers given a policy. Performance is evaluated as the number of targets found in a fixed time time_ep.

Type Default Details
policy list Starting from counter=1, prob of continuing for each counter value.
time_ep int Number of steps (decisions).
n int Number of agents that walk in parallel (all with the same policy, they do not interact). This is “number of walks” in the paper.
L int World size.
Nt int Number of targets.
r float Target radius.
lc float Cutoff length. Agent is displaced a distance lc from the target when it finds it.
destructive bool False True if targets are destructive. The default is False.
with_bound bool False True if policy is cut. The default is False.
bound int 100 Bound of the policy (maximum value for the counter). The default is 20.
Returns list, len(rewards)=n Number of targets found by each agent in time_ep steps of d=1.

Efficiency computation


source

agent_efficiency

 agent_efficiency (results_path, config, run, num_walks, episode_interval)

Computes the agent’s average search efficiency over a number of walks where the agent follows a fixed policy. This is repeated with the policies at different stages of the training to analyze the evolution of its performance.

Type Details
results_path str Path to the results folder, from which to extract the agent’s policies
config dict Dictionary with all the parameters. It needs to be the same configuration file as the one used to train the agent.
run int Id of the agent
num_walks int Number of (independent) walks
episode_interval int Every ‘episode_interval’ training episodes, the policy of the agent is taken and its performance is analyzed.

Benchmarks

Code to get the search efficiency of the benchmark models. We consider Lévy and bi-exponential distributions and obtain the model parameters that achieve the highest search efficiency. We use the library Tune for the efficiency optimization within given parameter ranges.


source

average_search_efficiency

 average_search_efficiency (config)

Get the average search efficiency, considering the benchmark model defined in config.

Type Details
config dict Dictionary with the configuration of the benchmark model.

Example

Set up the configuration, run and type of search

from rl_opts.learn_and_bench import average_search_efficiency
from ray import tune
from ray.tune.search.bayesopt import BayesOptSearch
from ray.tune.search import ConcurrencyLimiter
import numpy as np
#### Minimal example #####
run = '0'
search_type = 'Bayesian'
config = {'d_int': tune.uniform(0.00001, 20.0),
          'd_ext': 100.0,
          'p': tune.uniform(0.0, 1.0),
          'beta': None,
          'model': 'double_exp',
          'time_ep': 20,
          'n': 10,
          'lc': 3.0,
          'Nt': 100,
          'L': 100,
          'r': 0.5,
          'destructive': False,
          'results_path': None,
          'num_raytune_samples':10
         }

Initialize Tune

if search_type == 'Bayesian': #Bayesian optimization
    
    bayesopt = BayesOptSearch(metric="mean_eff", mode="max")
    bayesopt = ConcurrencyLimiter(bayesopt, max_concurrent=3)
    tuner = tune.Tuner(average_search_efficiency, 
                        tune_config=tune.TuneConfig(search_alg=bayesopt, num_samples=config['num_raytune_samples']), 
                        param_space=config)
    
elif search_type == 'Grid': #Grid search

    tuner = tune.Tuner(average_search_efficiency,
                        tune_config=tune.TuneConfig(num_samples=1),
                        param_space=config)

Run the algorithm

result_grid = tuner.fit()