Particle Swarm Optimization Policy (PSO-P) for Industrial Reinforcement Learning Problems

Abstract

In 2016 we introduced a model-based reinforcement learning (RL) approach for continuous state and action spaces [1]. While most RL methods try to find closed-form policies, the approach taken here employs numerical online optimization of control action sequences. First, a general method for reformulating RL problems as optimization tasks is provided. Subsequently, Particle Swarm Optimization (PSO) is applied to search for optimal solutions. This Particle Swarm Optimization Policy (PSO-P) is effective for high dimensional state spaces and does not require a priori assumptions about adequate policy representations. Furthermore, by translating RL problems into optimization tasks, the rich collection of real-world inspired RL benchmarks is made available for benchmarking numerical optimization techniques. The effectiveness of PSO-P is demonstrated on two standard benchmarks: mountain car and cart pole. To further investigate the properties and feasibility on reald-world applications, we applied PSO-P on the so-called Industrial Benchmark (IB), a novel reinforcement learning (LR) benchmark that aims at being realistic by including a variety of aspects found in industrial applications, such as continuous state and action spaces, a high dimensional, partially observable state space, delayed effects, and complex stochasticity. The experiments, first presented at the IJCNN 2017 in Anchorage [2], show that PSO-P is not only of interest for academic benchmarks, but also for real-world industrial applications, since it also yielded a policy of high performance for the IB. Compared to other well established RL techniques, PSO-P produced outstanding results in performance and robustness, requirung only a relatively low amount of effort in finding adequate parameters or making complex design decisions.

[1] D. Hein, A. Hentschel, T.Runkler, and S. Udluft, “Reinforcement learning with particle swarm optimization policy (PSO-P) in continuous state and action spaces.” International Journal of Swarm Intelligence Research (IJSIR), vol. 7, no. 3, pp. 23-42, 2016.

[2] D. Hein, S. Udluft, M. Tokic, A. Hentschel, T. Runkler, and V. Sterzing, “Batch reinforcement learning on the industrial benchmark: first experiences”, in Proceedings of the 16th international Joint Conference on Neural Networks, 2017.