REFERENCES

1. Littman ML. Markov games as a framework for multi­agent reinforcement learning. In Machine learning proceedings 1994, pages 157–163. Elsevier, 1994.

2. Lucian Busoniu, Robert Babuska, Bart De Schutter. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 2008;38:156-172.

3. Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 2019;575:350-354.

4. Pablo Hernandez­Leal, Michael Kaisers, Tim Baarslag, and Enrique Munoz de Cote. A survey of learning in multiagent environments: Dealing with non­stationarity. arXiv preprint arXiv: 1707.09183, 2017.

5. Georgios Papoudakis, Filippos Christianos, Arrasy Rahman, and Stefano V Albrecht. Dealing with non­stationarity in multi­agent deep reinforcement learning. arXiv preprint arXiv: 1906.04737, 2019.

6. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv: 1312.5602, 2013.

7. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv: 1707.06347, 2017.

8. Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor­critic: Off­policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.

9. Pablo Hernandez-Leal, Bilal Kartal, Matthew E Taylor. A survey and critique of multiagent deep reinforcement learning. Autonomous Agents and Multi-Agent Systems 2019;33:750-797.

10. Pablo Hernandez­Leal, Matthew E Taylor, Benjamin Rosman, L Enrique Sucar, and Enrique Munoz De Cote. Identifying and tracking switching, non­stationary opponents: A bayesian approach. In Workshops at the Thirtieth AAAI Conference on Artificial Intelligence, 2016.

11. Yan Zheng, Zhaopeng Meng, Jianye Hao, Zongzhang Zhang, Tianpei Yang, and Changjie Fan. A deep bayesian policy reuse approach against non­stationary agents. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pages 962–972, 2018.

12. Tianpei Yang, Jianye Hao, Zhaopeng Meng, Chongjie Zhang, Yan Zheng, and Ze Zheng. Towards efficient detection and optimal response against sophisticated opponents. In IJCAI, 2019.

13. He He, Jordan Boyd­Graber, Kevin Kwok, and Hal Daumé Ⅲ. Opponent modeling in deep reinforcement learning. In International conference on machine learning, pages 1804–1813. PMLR, 2016.

14. Zhang­Wei Hong, Shih­Yang Su, Tzu­Yun Shann, Yi­Hsiang Chang, and Chun­Yi Lee. A deep policy inference q­network for multi­agent systems. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pages 1388–1396, 2018.

15. Roberta Raileanu, Emily Denton, Arthur Szlam, and Rob Fergus. Modeling others using oneself in multi­agent reinforcement learning. In International conference on machine learning, pages 4257–4266. PMLR, 2018.

16. Jakob Foerster, Richard Y Chen, Maruan Al­Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch. Learning with opponentlearning awareness. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pages 122–130, 2018.

17. Long­Ji Lin. Reinforcement learning for robots using neural networks. Carnegie Mellon University, 1992.

18. Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet K Dokania, Philip HS Torr, and Marc'Aurelio Ranzato. On tiny episodic memories in continual learning. arXiv preprint arXiv: 1902.10486, 2019.

19. Aristotelis Chrysakis and Marie­Francine Moens. Online continual learning from imbalanced data. In International Conference on Machine Learning, pages 1952–1961. PMLR, 2020.

20. Khimya Khetarpal, Matthew Riemer, Irina Rish, and Doina Precup. Towards continual reinforcement learning: A review and perspectives. arXiv preprint arXiv: 2012.13490, 2020.

21. Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv: 1807.03748, 2018.

22. Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.

23. Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9729–9738, 2020.

24. Michael Laskin, Aravind Srinivas, and Pieter Abbeel. Curl: Contrastive unsupervised representations for reinforcement learning. In International Conference on Machine Learning, pages 5639–5650. PMLR, 2020.

25. Tjalling Koopmans. Activity analysis of production and allocation. 1951.

26. David Carmel, Shaul Markovitch. Model-based learning of interaction strategies in multi-agent systems. Journal of Experimental & Theoretical Artificial Intelligence 1998;10:309-332.

27. Benjamin Rosman, Majd Hawasly, Subramanian Ramamoorthy. Bayesian policy reuse. Machine Learning 2016;104:99-127.

28. Harmen De Weerd, Rineke Verbrugge, Bart Verheij. How much does it help to know what she knows you know? an agent-based simulation study. Artificial Intelligence 2013;199:67-92.

29. Xinlei Chen, Haoqi Fan, Ross Girshick, and Kaiming He. Improved baselines with momentum contrastive learning. arXiv preprint arXiv: 2003.04297, 2020.

30. Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey E Hinton. Big self-supervised models are strong semisupervised learners. Advances in neural information processing systems 2020;33:22243-22255.

31. Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, et al. Bootstrap your own latent-a new approach to self-supervised learning. Advances in Neural Information Processing Systems 2020;33:21271-21284.

32. Hansen EA, Bernstein DS, Zilberstein S. Dynamic programming for partially observable stochastic games. In AAAI, volume 4, pages 709–715, 2004.

Intelligence & Robotics
ISSN 2770-3541 (Online)
Follow Us

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/