from rl_opts.analytics import get_policy_from_dist, pdf_powerlaw, pdf_multimode
Benchmarks
We consider as benchmark models the discrete Lévy distribution and the bi-exponential distribution, given by equations
\[\Pr(L)=\zeta^{-1}_{(1+\beta, 1)} L^{-1-\beta}\,,\] and \[\Pr(L) = \sum_{i=1,2} \omega_i (1-e^{-1/d_i}) e^{-(L-1)/d_i} \, ,\] respectively, where \(\zeta_{(1+\beta, 1)}=\sum_{\ell=0}^\infty (\ell+1)^{-1-\beta}\) is the Riemann zeta function, \(d_i\) are length scales and the mode weights satisfy \(\sum_{i=1,2} \omega_i=1\).
We transform the step length distributions into policies with Eq. (5), which is implemented in the code with the method policy_from_dist
. This method inputs (i) the maximum value of step counter for which to compute the policy, (ii) the function of the model (either ‘pdf_powerlaw’ or ‘pdf_multimode’ in our case); and (iii) the parameter/s of the chosen model, which are, in our case, the exponent \(\beta\) for the Lévy distribution and \(d_1\), \(d_2\), \(\omega_1\) for the bi-exponential.
The step length distributions described above and the method to transform them into policies (policy_from_dist
) can be imported as follows:
You can get the policy of, e.g., a discrete Lévy distribution with \(\beta=1\) by running:
= get_policy_from_dist(n_max = 100,
policy_levy = pdf_powerlaw,
func = 1) beta
Note: the list policy_levy
displays the first n_max
points of the policy, given as \(\pi(\uparrow|n)\).
We employ the library Tune
for parameter optimization, which allows us to optimize the average search efficiency over a number a walks (mean_eff
) with respect to the model parameters. The function to optimize (mean_eff
) is computed with the method [
average_search_efficiency](https://gorkamunoz.github.io/rl_opts/lib_nbs/learning_and_benchmark.html#average_search_efficiency)
, and then reported to tune
.
from rl_opts.learn_and_bench import average_search_efficiency
Note: The walks are performed with [`walk_from_policy`](https://gorkamunoz.github.io/rl_opts/lib_nbs/learning_and_benchmark.html#walk_from_policy)
(see also tutorial on learning), which inputs a (non-changing) policy and runs the walks in parallel. In this case, the policy is the one corresponding to the benchmark distribution that is being evaluated.
average_search_efficiency
inputs a configuration dictionary with the parameter ranges that the optimization algorithm will consider.
The parameters of that dictionary are described below:
Model (we input parameter ranges here):
d_int
: small scale (\(d_1\), first mode) of the bi-exponential distribution
d_ext
: large scale (\(d_2\), second mode) of the bi-exponential distribution
p
: weight of the first mode (\(\omega_1\)) in the bi-exponential distribution
beta
: exponent of the Lévy distribution
model
: model description (fixed, either ‘powerlaw’ or ‘double_exp’)
Walks (we input a single value that is fixed throughout the optimization):
time_ep
: number of (small, \(d=1\)) steps per walk. We choose the same value for the benchmarks as for the episodes in the RL training
n
: number of walks (also referred to as agents in the code, but there is no relation to RL agents)
Environment (we input a single value that is fixed throughout the optimization):
lc
: cutoff length
Nt
: number of targets
L
: world size
r
: target detection radius
destructive
: whether targets are destructive or not (always set to False)
Other:
results_path
: Path where the resulting efficiencies for each walk are saved. If you set it None, the efficiencies are not saved. The mean efficiency can still be retrieved from the final Tune dataframe.num_raytune_samples
: Number of samples for tune (needed for Bayesian Optimization).
Once we define the parameter ranges, we choose the optimization algorithm. Among the different possibilities that Tune
offers, we chose Grid Search for the Lévy distribution and Bayesian Optimization for the bi-exponential distribution.
Example
Let us take the example with \(l_\textrm{c}=3\).
For the Lévy distribution, the config dictionary looks like:
= {'d_int': None,
config_lv 'd_ext': None,
'p': None,
'beta': tune.grid_search(np.linspace(0.01,1.,20)),
'model': 'powerlaw',
'time_ep': 100,
'n': 100,
'lc': 3.0,
'Nt': 100,
'L': 100,
'r': 0.5,
'destructive': False,
'results_path': None,
'num_raytune_samples': 10
}
We do a grid search over 20 parameters, linearly spaced in the interval \([0.01, 1]\). Parameters that correspond to the other model are set to ‘None’.
Then, we initialize the tuner, which by default does a grid search over the input parameters.
from ray import tune
= tune.Tuner(average_search_efficiency,
tuner =tune.TuneConfig(num_samples=1),
tune_config=config_lv) param_space
And we run the algorithm:
= tuner.fit() result_grid_lv
For the bi-exponential distribution, the config dictionary looks like:
= {'d_int': tune.uniform(0.00001, 20.0),
config_be 'd_ext': 100.0,
'p': tune.uniform(0.0, 1.0),
'beta': None,
'model': 'double_exp',
'time_ep': 100,
'n': 100,
'lc': 3.0,
'Nt': 100,
'L': 100,
'r': 0.5,
'destructive': False,
'results_path': None,
'num_raytune_samples': 10
}
In this case, since we choose a Bayesian optimization method, we do not specify the parameters to try, but just the ranges. For the small scale, we consider a range that is of the order of the scale of \(l_\textrm{c}\). We fix the value for \(d_2\) to further guide the search and make it more time efficient. We do the search with \(d_2=100\), which is the scale of the average distance between targets, and with \(d_2=10^5\). Again, the parameter \(\beta\) that corresponds to the other model is set to ‘None’.
We first initialize the Bayesian optimization method, and then the tuner.
from ray import tune
from ray.tune.search.bayesopt import BayesOptSearch
from ray.tune.search import ConcurrencyLimiter
= BayesOptSearch(metric="mean_eff", mode="max")
bayesopt = ConcurrencyLimiter(bayesopt, max_concurrent=3)
bayesopt = tune.Tuner(average_search_efficiency,
tuner =tune.TuneConfig(search_alg=bayesopt, num_samples=config['num_raytune_samples']),
tune_config=config_be) param_space
Note that we limit the number of concurrent processes to 3, so that the method can update itself more times within the num_raytune_samples
samples.
And we run it:
= tuner.fit() result_grid_be
Results
The results can be retrieved as a panda dataframe:
= result_grid_lv.get_dataframe() results_lv_df
These results can also be saved as a panda dataframe in the folder indicated in the config dictionary. We refer the reader to the Tune
documentation for further details on data saving and retrieval.
Reproduction of results
In order to reproduce the results of the paper, you can access the configuration dictionaries in the folder ‘configurations/benchmark_models/’. For example:
= 'configurations/benchmark_models/'
config_path = 'powerlaw'
model = 3
lc = 0
run = np.load(config_path+'config_'+str(model)+'_lc_'+str(float(lc))+'_run_'+str(run)+'.npy', allow_pickle=True).item() config_paper
Note that you need to provide a run
number. The reason for this is that, in some cases, we run the optimization several times for the same models. For example, for the bi-exponential distribution, we run it twice, first with \(d_2 = 10^5\) (run_0) and then with \(d_2 = 100\) (run_1). For the Lévy distribution, there is only run_0.
The mean efficiency achieved by the best model, together with the model parameters, can be retrieved from the resulting dataframe (see above). In addition, if you want the list with the efficiency of each walk, you can obtain it with get_opt
(provided a results path was input in the configuration dictionary):
from rl_opts.utils import get_opt
= get_opt(config_lv['results_path'], results_lv_df) efficiency, parameters
Note that this method inputs the panda dataframe with the obtained results. As additional output, it provides the parameters of the model that achieved the highest efficiency. For the Lévy distribution, the exponent \(\beta\) is given. For the bi-exponential, it outputs a list of the form \([d_1, d_2, \omega_1]\).
As default value, the saved config dictionaries have the results path set to None (in which case get_opt
outputs the mean efficiency retrieved from the given dataframe), so if you want to obtain the efficiency list, change it and add a path of your choice.