RL-OptS

Benchmarks

  • Get started
  • Documentation
    • RL framework
      • Classic version
      • numba implementation
        • Reinforcement learning environments
        • Reinforcement learning agents
    • Learning and benchmarking
    • Imitation learning
    • Analytical functions
    • Utils
  • Tutorials
    • Reinforcement Learning
    • Benchmarks
    • Imitation learning
    • Learning to reset in target search problems

On this page

  • Reproduction of results

Report an issue

Benchmarks

We consider as benchmark models the discrete Lévy distribution and the bi-exponential distribution, given by equations

\[\Pr(L)=\zeta^{-1}_{(1+\beta, 1)} L^{-1-\beta}\,,\] and \[\Pr(L) = \sum_{i=1,2} \omega_i (1-e^{-1/d_i}) e^{-(L-1)/d_i} \, ,\] respectively, where \(\zeta_{(1+\beta, 1)}=\sum_{\ell=0}^\infty (\ell+1)^{-1-\beta}\) is the Riemann zeta function, \(d_i\) are length scales and the mode weights satisfy \(\sum_{i=1,2} \omega_i=1\).

We transform the step length distributions into policies with Eq. (5), which is implemented in the code with the method policy_from_dist. This method inputs (i) the maximum value of step counter for which to compute the policy, (ii) the function of the model (either ‘pdf_powerlaw’ or ‘pdf_multimode’ in our case); and (iii) the parameter/s of the chosen model, which are, in our case, the exponent \(\beta\) for the Lévy distribution and \(d_1\), \(d_2\), \(\omega_1\) for the bi-exponential.

The step length distributions described above and the method to transform them into policies (policy_from_dist) can be imported as follows:

from rl_opts.analytics import get_policy_from_dist, pdf_powerlaw, pdf_multimode

You can get the policy of, e.g., a discrete Lévy distribution with \(\beta=1\) by running:

policy_levy = get_policy_from_dist(n_max = 100, 
                              func = pdf_powerlaw,
                              beta = 1)

Note: the list policy_levy displays the first n_max points of the policy, given as \(\pi(\uparrow|n)\).

We employ the library Tune for parameter optimization, which allows us to optimize the average search efficiency over a number a walks (mean_eff) with respect to the model parameters. The function to optimize (mean_eff) is computed with the method [average_search_efficiency](https://gorkamunoz.github.io/rl_opts/lib_nbs/learning_and_benchmark.html#average_search_efficiency), and then reported to tune.

from rl_opts.learn_and_bench import average_search_efficiency

Note: The walks are performed with [`walk_from_policy`](https://gorkamunoz.github.io/rl_opts/lib_nbs/learning_and_benchmark.html#walk_from_policy) (see also tutorial on learning), which inputs a (non-changing) policy and runs the walks in parallel. In this case, the policy is the one corresponding to the benchmark distribution that is being evaluated.

average_search_efficiency inputs a configuration dictionary with the parameter ranges that the optimization algorithm will consider.

The parameters of that dictionary are described below:

Model (we input parameter ranges here):

  • d_int : small scale (\(d_1\), first mode) of the bi-exponential distribution
  • d_ext : large scale (\(d_2\), second mode) of the bi-exponential distribution
  • p : weight of the first mode (\(\omega_1\)) in the bi-exponential distribution
  • beta : exponent of the Lévy distribution
  • model : model description (fixed, either ‘powerlaw’ or ‘double_exp’)

Walks (we input a single value that is fixed throughout the optimization):

  • time_ep : number of (small, \(d=1\)) steps per walk. We choose the same value for the benchmarks as for the episodes in the RL training
  • n : number of walks (also referred to as agents in the code, but there is no relation to RL agents)

Environment (we input a single value that is fixed throughout the optimization):

  • lc : cutoff length
  • Nt : number of targets
  • L : world size
  • r : target detection radius
  • destructive : whether targets are destructive or not (always set to False)

Other:

  • results_path : Path where the resulting efficiencies for each walk are saved. If you set it None, the efficiencies are not saved. The mean efficiency can still be retrieved from the final Tune dataframe.
  • num_raytune_samples : Number of samples for tune (needed for Bayesian Optimization).

Once we define the parameter ranges, we choose the optimization algorithm. Among the different possibilities that Tune offers, we chose Grid Search for the Lévy distribution and Bayesian Optimization for the bi-exponential distribution.

Example

Let us take the example with \(l_\textrm{c}=3\).

For the Lévy distribution, the config dictionary looks like:

config_lv = {'d_int': None,
          'd_ext': None,
          'p': None,
          'beta': tune.grid_search(np.linspace(0.01,1.,20)), 
          'model': 'powerlaw',
          'time_ep': 100,
          'n': 100,
          'lc': 3.0,
          'Nt': 100,
          'L': 100,
          'r': 0.5,
          'destructive': False,
          'results_path': None,
          'num_raytune_samples': 10
          }

We do a grid search over 20 parameters, linearly spaced in the interval \([0.01, 1]\). Parameters that correspond to the other model are set to ‘None’.

Then, we initialize the tuner, which by default does a grid search over the input parameters.

from ray import tune

tuner = tune.Tuner(average_search_efficiency,
                   tune_config=tune.TuneConfig(num_samples=1),
                   param_space=config_lv)

And we run the algorithm:

result_grid_lv = tuner.fit()

For the bi-exponential distribution, the config dictionary looks like:

config_be = {'d_int': tune.uniform(0.00001, 20.0),
          'd_ext': 100.0,
          'p': tune.uniform(0.0, 1.0),
          'beta': None,
          'model': 'double_exp',
          'time_ep': 100,
          'n': 100,
          'lc': 3.0,
          'Nt': 100,
          'L': 100,
          'r': 0.5,
          'destructive': False,
          'results_path': None,
          'num_raytune_samples': 10
          }

In this case, since we choose a Bayesian optimization method, we do not specify the parameters to try, but just the ranges. For the small scale, we consider a range that is of the order of the scale of \(l_\textrm{c}\). We fix the value for \(d_2\) to further guide the search and make it more time efficient. We do the search with \(d_2=100\), which is the scale of the average distance between targets, and with \(d_2=10^5\). Again, the parameter \(\beta\) that corresponds to the other model is set to ‘None’.

We first initialize the Bayesian optimization method, and then the tuner.

from ray import tune
from ray.tune.search.bayesopt import BayesOptSearch
from ray.tune.search import ConcurrencyLimiter

bayesopt = BayesOptSearch(metric="mean_eff", mode="max")
bayesopt = ConcurrencyLimiter(bayesopt, max_concurrent=3)
tuner = tune.Tuner(average_search_efficiency, 
                   tune_config=tune.TuneConfig(search_alg=bayesopt, num_samples=config['num_raytune_samples']), 
                   param_space=config_be)

Note that we limit the number of concurrent processes to 3, so that the method can update itself more times within the num_raytune_samples samples.

And we run it:

result_grid_be = tuner.fit()

Results

The results can be retrieved as a panda dataframe:

results_lv_df = result_grid_lv.get_dataframe()

These results can also be saved as a panda dataframe in the folder indicated in the config dictionary. We refer the reader to the Tune documentation for further details on data saving and retrieval.

Reproduction of results

In order to reproduce the results of the paper, you can access the configuration dictionaries in the folder ‘configurations/benchmark_models/’. For example:

config_path = 'configurations/benchmark_models/'
model = 'powerlaw'
lc = 3
run = 0
config_paper = np.load(config_path+'config_'+str(model)+'_lc_'+str(float(lc))+'_run_'+str(run)+'.npy', allow_pickle=True).item()

Note that you need to provide a run number. The reason for this is that, in some cases, we run the optimization several times for the same models. For example, for the bi-exponential distribution, we run it twice, first with \(d_2 = 10^5\) (run_0) and then with \(d_2 = 100\) (run_1). For the Lévy distribution, there is only run_0.

The mean efficiency achieved by the best model, together with the model parameters, can be retrieved from the resulting dataframe (see above). In addition, if you want the list with the efficiency of each walk, you can obtain it with get_opt (provided a results path was input in the configuration dictionary):

from rl_opts.utils import get_opt

efficiency, parameters = get_opt(config_lv['results_path'], results_lv_df)

Note that this method inputs the panda dataframe with the obtained results. As additional output, it provides the parameters of the model that achieved the highest efficiency. For the Lévy distribution, the exponent \(\beta\) is given. For the bi-exponential, it outputs a list of the form \([d_1, d_2, \omega_1]\).

As default value, the saved config dictionaries have the results path set to None (in which case get_opt outputs the mean efficiency retrieved from the given dataframe), so if you want to obtain the efficiency list, change it and add a path of your choice.