from rl_opts.learn_and_bench import average_search_efficiency
from ray import tune
from ray.tune.search.bayesopt import BayesOptSearch
from ray.tune.search import ConcurrencyLimiter
import numpy as np
Learning and benchmarking
This notebook gathers the functions needed to train agents to forage optimally, as well as tools to calculate their foraging efficiency as well as their comparison to benchmark foraging strategies.
Learning
learning
learning (config, results_path, run)
Training of the RL agent
Type | Details | |
---|---|---|
config | dict | Dictionary with all the parameters |
results_path | str | Path to save the results |
run | int | Agent identifier |
Generate walk from a policy
walk_from_policy
walk_from_policy (policy, time_ep, n, L, Nt, r, lc, destructive=False, with_bound=False, bound=100)
Walk of foragers given a policy. Performance is evaluated as the number of targets found in a fixed time time_ep.
Type | Default | Details | |
---|---|---|---|
policy | list | Starting from counter=1, prob of continuing for each counter value. | |
time_ep | int | Number of steps (decisions). | |
n | int | Number of agents that walk in parallel (all with the same policy, they do not interact). This is “number of walks” in the paper. | |
L | int | World size. | |
Nt | int | Number of targets. | |
r | float | Target radius. | |
lc | float | Cutoff length. Agent is displaced a distance lc from the target when it finds it. | |
destructive | bool | False | True if targets are destructive. The default is False. |
with_bound | bool | False | True if policy is cut. The default is False. |
bound | int | 100 | Bound of the policy (maximum value for the counter). The default is 20. |
Returns | list, len(rewards)=n | Number of targets found by each agent in time_ep steps of d=1. |
Efficiency computation
agent_efficiency
agent_efficiency (results_path, config, run, num_walks, episode_interval)
Computes the agent’s average search efficiency over a number of walks where the agent follows a fixed policy. This is repeated with the policies at different stages of the training to analyze the evolution of its performance.
Type | Details | |
---|---|---|
results_path | str | Path to the results folder, from which to extract the agent’s policies |
config | dict | Dictionary with all the parameters. It needs to be the same configuration file as the one used to train the agent. |
run | int | Id of the agent |
num_walks | int | Number of (independent) walks |
episode_interval | int | Every ‘episode_interval’ training episodes, the policy of the agent is taken and its performance is analyzed. |
Benchmarks
Code to get the search efficiency of the benchmark models. We consider Lévy and bi-exponential distributions and obtain the model parameters that achieve the highest search efficiency. We use the library Tune
for the efficiency optimization within given parameter ranges.
average_search_efficiency
average_search_efficiency (config)
Get the average search efficiency, considering the benchmark model defined in config.
Type | Details | |
---|---|---|
config | dict | Dictionary with the configuration of the benchmark model. |
Example
Set up the configuration, run and type of search
#### Minimal example #####
= '0'
run = 'Bayesian'
search_type = {'d_int': tune.uniform(0.00001, 20.0),
config 'd_ext': 100.0,
'p': tune.uniform(0.0, 1.0),
'beta': None,
'model': 'double_exp',
'time_ep': 20,
'n': 10,
'lc': 3.0,
'Nt': 100,
'L': 100,
'r': 0.5,
'destructive': False,
'results_path': None,
'num_raytune_samples':10
}
Initialize Tune
if search_type == 'Bayesian': #Bayesian optimization
= BayesOptSearch(metric="mean_eff", mode="max")
bayesopt = ConcurrencyLimiter(bayesopt, max_concurrent=3)
bayesopt = tune.Tuner(average_search_efficiency,
tuner =tune.TuneConfig(search_alg=bayesopt, num_samples=config['num_raytune_samples']),
tune_config=config)
param_space
elif search_type == 'Grid': #Grid search
= tune.Tuner(average_search_efficiency,
tuner =tune.TuneConfig(num_samples=1),
tune_config=config) param_space
Run the algorithm
= tuner.fit() result_grid