Proposal

Summary

One of the key measures of a reinforcement learning algorithm is its sample efficiency. Despite the recent success of deep reinforcement learning on a wide variety of tasks (Mnih et al., 2015; Heess et al., 2017), it is still difficult to train an agent in a manner that is both performant and sample efficient. One popular approach to solving this problem is to incorporate demonstrations from humans or other controllers, as in (Hester et al., 2018). Demonstrations provide a “starting point” for the agent, so that it does not have to waste steps wandering the environment. Meanwhile, looking at supervised learning, a common approach to improve performance involves ensembles, essentially collections of models whose predictions are aggregated to create a more accurate prediction (Goodfellow et al., 2016). Though ensembles have been attempted before in deep reinforcement learning, they have not gained much traction. Yet, we believe they may hold some promise, as (Faußer & Schwenker, 2015) has shown that feeding environment interactions to an ensemble, instead of a single agent, can lead to better performance on tasks such as SZ-Tetris. Thus, we seek to combine these two approaches – demonstrations and ensembles, and investigate their effects on sample efficiency and performance. Specifically, we seek to implement our algorithms (likely an actor-critic method such as (Haarnoja et al., 2018)) in the Duckietown platform (Paull et al., 2017) and ultimately compete in the lane following challenges of the AI Driving Olympics (Zilly et al., 2019).

Evaluation Plan

Quantitative

We will evaluate our project by running it in the Duckietown platform. Due to the compressed timeline of this project, we will focus on running in simulation, as real robots introduce technical difficulties that are out of scope. Within the Duckietown platform, our goal will be to perform the LF (Lane Following), LFV (Lane Following with dynamic Vehicles), and LFVI (Lane Following with dynamic Vehicles and Intersections) tasks of the AI Driving Olympics. The metrics for our project will consist of the performance metrics defined in (Zilly et al., 2019) – namely, Performance, obedience to Traffic Law, and Comfort. In addition to these metrics, we will measure sample efficiency by looking at our agents’ performance after various numbers of interactions with the environment. We plan to vary our algorithms along two axes – with and without demonstrations, and with varying ensemble sizes. Our baseline will be an agent that does not use any demonstrations and has an ensemble of size 1 (i.e. a single model). As we have little prior experience with Duckietown, it is difficult to estimate how much better our algorithms will do, but based on the performance improvements described in (Faußer & Schwenker, 2015) and (Hester et al., 2018), we estimate that adding ensembles and demonstrations will cause our policies to have 50% improvement in performance over the baseline, while requiring on the order of 107 fewer episodes to achieve the same performance as the baseline agent. To summarize, we will evaluate our algorithms on the AI Driving Olympics in Duckietown on the basis of performance and sample efficiency.

Qualitative

We will evaluate our performance qualitatively by watching videos of our agents’ performance in the Duckietown simulation (and perhaps in real world). We will know our agent is working if it is successfully completing the lane following challenges, i.e. driving along the road in the simulation. To visualize the internals of our algorithm, we will plot the average return of the agent after various numbers of environment interactions and training steps, as well as the loss functions of our neural networks. In general, loss should converge to zero, while return should increase and stabilize at an asymptote. As a stretch, we could evaluate our algorithm on a physical Duckiebot at UCI, and as the ultimate moonshot, we could submit our algorithm to the upcoming iteration of the AIDO.

Goals

  1. Minimum Goal: Train and evaluate ensembles of actor-critic models
    • Milestone 1 (due February 7): Implement and evaluate a Soft Actor-Critic agent in the LF challenge
    • Milestone 2: Implement a Soft Actor-Critic agent that uses an ensemble, and evaluate it in the LF, LFV, and LFVI challenges
  2. Realistic Goal: Incorporate demonstrations into the agent and evaluate it on the LF, LFV, and LFVI challenges
  3. Ambitious Goal: Transfer the algorithms to a physical Duckiebot and compete in the next iteration of AI-DO

References

  1. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236
  2. Heess, N., TB, D., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y., Erez, T., Wang, Z., Eslami, S. M. A., Riedmiller, M. A., & Silver, D. (2017). Emergence of Locomotion Behaviours in Rich Environments. CoRR, abs/1707.02286. http://arxiv.org/abs/1707.02286
  3. Hester, T., Vecerik, M., Pietquin, O., Lanctot, M., Schaul, T., Piot, B., Horgan, D., Quan, J., Sendonaris, A., Osband, I., Dulac-Arnold, G., Agapiou, J., Leibo, J., & Gruslys, A. (2018). Deep Q-learning From Demonstrations. https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16976
  4. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
  5. Faußer, S., & Schwenker, F. (2015). Neural Network Ensembles in Reinforcement Learning. Neural Processing Letters, 41(1), 55–69. https://doi.org/10.1007/s11063-013-9334-5
  6. Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In J. Dy & A. Krause (Eds.), Proceedings of the 35th International Conference on Machine Learning (Vol. 80, pp. 1861–1870). PMLR. http://proceedings.mlr.press/v80/haarnoja18b.html
  7. Paull, L., Tani, J., Ahn, H., Alonso-Mora, J., Carlone, L., Cap, M., Chen, Y. F., Choi, C., Dusek, J., Fang, Y., Hoehener, D., Liu, S., Novitzky, M., Okuyama, I. F., Pazis, J., Rosman, G., Varricchio, V., Wang, H., Yershov, D., … Censi, A. (2017). Duckietown: An open, inexpensive and flexible platform for autonomy education and research. 2017 IEEE International Conference on Robotics and Automation (ICRA), 1497–1504. https://doi.org/10.1109/ICRA.2017.7989179
  8. Zilly, J., Tani, J., Considine, B., Mehta, B., Daniele, A. F., Diaz, M., Bernasconi, G., Ruch, C., Hakenberg, J., Golemo, F., Bowser, A. K., Walter, M. R., Hristov, R., Mallya, S., Frazzoli, E., Censi, A., & Paull, L. (2019). The AI Driving Olympics at NeurIPS 2018. ArXiv Preprint ArXiv:1903.02503.