REFERENCES

1. Littman ML. Markov games as a framework for multiagent reinforcement learning. In Machine learning proceedings 1994, pages 157–163. Elsevier, 1994.

2. Lucian Busoniu, Robert Babuska, Bart De Schutter. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 2008;38:156-172.

3. Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 2019;575:350-354.

4. Pablo HernandezLeal, Michael Kaisers, Tim Baarslag, and Enrique Munoz de Cote. A survey of learning in multiagent environments: Dealing with nonstationarity. arXiv preprint arXiv: 1707.09183, 2017.

5. Georgios Papoudakis, Filippos Christianos, Arrasy Rahman, and Stefano V Albrecht. Dealing with nonstationarity in multiagent deep reinforcement learning. arXiv preprint arXiv: 1906.04737, 2019.

6. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv: 1312.5602, 2013.

7. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv: 1707.06347, 2017.

8. Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actorcritic: Offpolicy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.

9. Pablo Hernandez-Leal, Bilal Kartal, Matthew E Taylor. A survey and critique of multiagent deep reinforcement learning. Autonomous Agents and Multi-Agent Systems 2019;33:750-797.

10. Pablo HernandezLeal, Matthew E Taylor, Benjamin Rosman, L Enrique Sucar, and Enrique Munoz De Cote. Identifying and tracking switching, nonstationary opponents: A bayesian approach. In Workshops at the Thirtieth AAAI Conference on Artificial Intelligence, 2016.

11. Yan Zheng, Zhaopeng Meng, Jianye Hao, Zongzhang Zhang, Tianpei Yang, and Changjie Fan. A deep bayesian policy reuse approach against nonstationary agents. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pages 962–972, 2018.

12. Tianpei Yang, Jianye Hao, Zhaopeng Meng, Chongjie Zhang, Yan Zheng, and Ze Zheng. Towards efficient detection and optimal response against sophisticated opponents. In IJCAI, 2019.

13. He He, Jordan BoydGraber, Kevin Kwok, and Hal Daumé Ⅲ. Opponent modeling in deep reinforcement learning. In International conference on machine learning, pages 1804–1813. PMLR, 2016.

14. ZhangWei Hong, ShihYang Su, TzuYun Shann, YiHsiang Chang, and ChunYi Lee. A deep policy inference qnetwork for multiagent systems. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pages 1388–1396, 2018.

15. Roberta Raileanu, Emily Denton, Arthur Szlam, and Rob Fergus. Modeling others using oneself in multiagent reinforcement learning. In International conference on machine learning, pages 4257–4266. PMLR, 2018.

16. Jakob Foerster, Richard Y Chen, Maruan AlShedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch. Learning with opponentlearning awareness. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pages 122–130, 2018.

17. LongJi Lin. Reinforcement learning for robots using neural networks. Carnegie Mellon University, 1992.

18. Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet K Dokania, Philip HS Torr, and Marc'Aurelio Ranzato. On tiny episodic memories in continual learning. arXiv preprint arXiv: 1902.10486, 2019.

19. Aristotelis Chrysakis and MarieFrancine Moens. Online continual learning from imbalanced data. In International Conference on Machine Learning, pages 1952–1961. PMLR, 2020.

20. Khimya Khetarpal, Matthew Riemer, Irina Rish, and Doina Precup. Towards continual reinforcement learning: A review and perspectives. arXiv preprint arXiv: 2012.13490, 2020.

21. Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv: 1807.03748, 2018.

22. Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.

23. Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9729–9738, 2020.

24. Michael Laskin, Aravind Srinivas, and Pieter Abbeel. Curl: Contrastive unsupervised representations for reinforcement learning. In International Conference on Machine Learning, pages 5639–5650. PMLR, 2020.

25. Tjalling Koopmans. Activity analysis of production and allocation. 1951.

26. David Carmel, Shaul Markovitch. Model-based learning of interaction strategies in multi-agent systems. Journal of Experimental & Theoretical Artificial Intelligence 1998;10:309-332.

27. Benjamin Rosman, Majd Hawasly, Subramanian Ramamoorthy. Bayesian policy reuse. Machine Learning 2016;104:99-127.

28. Harmen De Weerd, Rineke Verbrugge, Bart Verheij. How much does it help to know what she knows you know? an agent-based simulation study. Artificial Intelligence 2013;199:67-92.

29. Xinlei Chen, Haoqi Fan, Ross Girshick, and Kaiming He. Improved baselines with momentum contrastive learning. arXiv preprint arXiv: 2003.04297, 2020.

30. Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey E Hinton. Big self-supervised models are strong semisupervised learners. Advances in neural information processing systems 2020;33:22243-22255.

31. Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, et al. Bootstrap your own latent-a new approach to self-supervised learning. Advances in Neural Information Processing Systems 2020;33:21271-21284.

32. Hansen EA, Bernstein DS, Zilberstein S. Dynamic programming for partially observable stochastic games. In AAAI, volume 4, pages 709–715, 2004.