Learning and benchmarking

This notebook gathers the functions needed to train agents to forage optimally, as well as tools to calculate their foraging efficiency as well as their comparison to benchmark foraging strategies.

Learning

source

learning

 learning (config, results_path, run)

Training of the RL agent

	Type	Details
config	dict	Dictionary with all the parameters
results_path	str	Path to save the results
run	int	Agent identifier

Generate walk from a policy

source

walk_from_policy

 walk_from_policy (policy, time_ep, n, L, Nt, r, lc, destructive=False,
                   with_bound=False, bound=100)

Walk of foragers given a policy. Performance is evaluated as the number of targets found in a fixed time time_ep.

	Type	Default	Details
policy	list		Starting from counter=1, prob of continuing for each counter value.
time_ep	int		Number of steps (decisions).
n	int		Number of agents that walk in parallel (all with the same policy, they do not interact). This is “number of walks” in the paper.
L	int		World size.
Nt	int		Number of targets.
r	float		Target radius.
lc	float		Cutoff length. Agent is displaced a distance lc from the target when it finds it.
destructive	bool	False	True if targets are destructive. The default is False.
with_bound	bool	False	True if policy is cut. The default is False.
bound	int	100	Bound of the policy (maximum value for the counter). The default is 20.
Returns	list, len(rewards)=n		Number of targets found by each agent in time_ep steps of d=1.

Efficiency computation

source

agent_efficiency

 agent_efficiency (results_path, config, run, num_walks, episode_interval)

Computes the agent’s average search efficiency over a number of walks where the agent follows a fixed policy. This is repeated with the policies at different stages of the training to analyze the evolution of its performance.

	Type	Details
results_path	str	Path to the results folder, from which to extract the agent’s policies
config	dict	Dictionary with all the parameters. It needs to be the same configuration file as the one used to train the agent.
run	int	Id of the agent
num_walks	int	Number of (independent) walks
episode_interval	int	Every ‘episode_interval’ training episodes, the policy of the agent is taken and its performance is analyzed.

Benchmarks

Code to get the search efficiency of the benchmark models. We consider Lévy and bi-exponential distributions and obtain the model parameters that achieve the highest search efficiency. We use the library Tune for the efficiency optimization within given parameter ranges.

source

average_search_efficiency

 average_search_efficiency (config)

Get the average search efficiency, considering the benchmark model defined in config.

	Type	Details
config	dict	Dictionary with the configuration of the benchmark model.

Example

Set up the configuration, run and type of search

from rl_opts.learn_and_bench import average_search_efficiency
from ray import tune
from ray.tune.search.bayesopt import BayesOptSearch
from ray.tune.search import ConcurrencyLimiter
import numpy as np

#### Minimal example #####
run = '0'
search_type = 'Bayesian'
config = {'d_int': tune.uniform(0.00001, 20.0),
          'd_ext': 100.0,
          'p': tune.uniform(0.0, 1.0),
          'beta': None,
          'model': 'double_exp',
          'time_ep': 20,
          'n': 10,
          'lc': 3.0,
          'Nt': 100,
          'L': 100,
          'r': 0.5,
          'destructive': False,
          'results_path': None,
          'num_raytune_samples':10
         }

Initialize Tune

if search_type == 'Bayesian': #Bayesian optimization
    
    bayesopt = BayesOptSearch(metric="mean_eff", mode="max")
    bayesopt = ConcurrencyLimiter(bayesopt, max_concurrent=3)
    tuner = tune.Tuner(average_search_efficiency, 
                        tune_config=tune.TuneConfig(search_alg=bayesopt, num_samples=config['num_raytune_samples']), 
                        param_space=config)
    
elif search_type == 'Grid': #Grid search

    tuner = tune.Tuner(average_search_efficiency,
                        tune_config=tune.TuneConfig(num_samples=1),
                        param_space=config)

Run the algorithm

result_grid = tuner.fit()