REFERENCES

1. Nair A, Srinivasan P, Blackwell S, et al. Massively parallel methods for deep reinforcement learning. CoRR 2015;abs/1507.04296. Available from: http://arxiv.org/abs/1507.04296..

2. Grounds M, Kudenko D. Parallel reinforcement learning with linear function approximation. In: Tuyls K, Nowe A, Guessoum Z, Kudenko D, editors. Adaptive Agents and Multi-Agent Systems III. Adaptation and Multi-Agent Learning. Berlin, Heidelberg: Springer Berlin Heidelberg; 2008. pp. 60-74.

3. Clemente AV, Martínez HNC, Chandra A. Efficient parallel methods for deep reinforcement learning. CoRR 2017;abs/1705.04862. Available from: http://arxiv.org/abs/1705.04862..

4. Lim WYB, Luong NC, Hoang DT, et al. Federated learning in mobile edge networks: a comprehensive survey. IEEE Communications Surveys Tutorials 2020;22:2031-63.

5. Nguyen DC, Ding M, Pathirana PN, et al. Federated learning for internet of things: a comprehensive survey. IEEE Communications Surveys Tutorials 2021;23:1622-58.

6. Khan LU, Saad W, Han Z, Hossain E, Hong CS. Federated learning for internet of things: recent advances, taxonomy, and open challenges. IEEE Communications Surveys Tutorials 2021;23:1759-99.

7. Yang Q, Liu Y, Cheng Y, et al. 1st ed. Morgan & Claypool; 2019.

8. Yang Q, Liu Y, Chen T, Tong Y. Federated machine learning: concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 2019;10:1-19.

9. Qinbin L, Zeyi W, Bingsheng H. Federated learning systems: vision, hype and reality for data privacy and protection. CoRR 2019;abs/1907.09693. Available from: http://arxiv.org/abs/1907.09693..

10. Li T, Sahu AK, Talwalkar A, Smith V. Federated learning: challenges, methods, and future directions. IEEE Signal Processing Magazine 2020;37:50-60.

11. Wang S, Tuor T, Salonidis T, Leung KK, Makaya C, et al. Adaptive federated learning in resource constrained edge computing systems. IEEE Journal on Selected Areas in Communications 2019;37:1205-21.

12. McMahan HB, Moore E, Ramage D, y Arcas BA. Communication-efficient learning of deep networks from decentralized data. CoRR 2016;abs/1602.05629. Available from: http://arxiv.org/abs/1602.05629..

13. Phong LT, Aono Y, Hayashi T, Wang L, Moriai S. Privacy-preserving deep learning via additively homomorphic encryption. IEEE Transactions on Information Forensics and Security 2018;13:1333-45.

14. Zhu H, Jin Y. Multi-objective evolutionary federated learning. IEEE Transactions on Neural Networks and Learning Systems 2020;31:1310-22.

15. Kairouz P, McMahan HB, Avent B, et al. Advances and open problems in federated learning. CoRR 2019;abs/1912.04977. Available from: http://arxiv.org/abs/1912.04977..

16. Pan SJ, Yang Q. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 2010;22:1345-59.

17. Li Y. Deep reinforcement learning: an overview. CoRR 2017;abs/1701.07274. Available from: http://arxiv.org/abs/1701.07274..

18. Xu Z, Tang J, Meng J, et al. Experience-driven networking: a deep reinforcement learning based approach. In: IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE; 2018. pp. 1871-79.

19. Mohammadi M, Al-Fuqaha A, Guizani M, Oh JS. Semisupervised deep reinforcement learning in support of IoT and smart city services. IEEE Internet of Things Journal 2018;5:624-35.

20. Bu F, Wang X. A smart agriculture IoT system based on deep reinforcement learning. Future Generation Computer Systems 2019;99:500–507. Available from: https://www.sciencedirect.com/science/article/pii/S0167739X19307277..

21. Xiong X, Zheng K, Lei L, Hou L. Resource allocation based on deep reinforcement learning in IoT edge computing. IEEE Journal on Selected Areas in Communications 2020;38:1133-46.

22. Lei L, Qi J, Zheng K. Patent analytics based on feature vector space model: a case of IoT. IEEE Access 2019;7:45705-15.

23. Shalev-Shwartz S, Shammah S, Shashua A. Safe, multi-agent, reinforcement learning for autonomous driving. CoRR 2016;abs/1610.03295. Available from: http://arxiv.org/abs/1610.03295..

24. Sallab AE, Abdou M, Perot E, Yogamani S. Deep reinforcement learning framework for autonomous driving. Electronic Imaging 2017;2017:70-76.

25. Taylor ME. Teaching reinforcement learning with mario: an argument and case study. In: Second AAAI Symposium on Educational Advances in Artificial Intelligence; 2011. Available from: https://www.aaai.org/ocs/index.php/EAAI/EAAI11/paper/viewPaper/3515..

26. Holcomb SD, Porter WK, Ault SV, Mao G, Wang J. Overview on deepmind and its alphago zero ai. In: Proceedings of the 2018 international conference on big data and education 2018. pp. 67-71.

27. Watkins CJ, Dayan P. Q-learning. Machine learning 1992;8:279–92. Available from: https://link.springer.com/content/pdf/10.1007/BF00992698.pdf..

28. Thorpe TL. Vehicle traffic light control using sarsa. In: Online]. Available: citeseer. ist. psu. edu/thorpe97vehicle. html. Citeseer; 1997. Available from: https://citeseer.ist.psu.edu/thorpe97vehicle.html..

29. Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms. In: Xing EP, Jebara T, editors. Proceedings of the 31st International Conference on Machine Learning. vol. 32 of Proceedings of Machine Learning Research. Bejing, China: PMLR; 2014. pp. 387–95. Available from: https://proceedings.mlr.press/v32/silver14.html..

30. Williams RJ. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 1992;8:229-56.

31. Konda VR, Tsitsiklis JN. Actor-critic algorithms. In: Advances in neural information processing systems; 2000. pp. 1008–14. Available from: https://proceedings.neurips.cc/paper/1786-actor-critic-algorithms.pdf.

32. Henderson P, Islam R, Bachman P, et al. Deep reinforcement learning that matters. In: Proceedings of the AAAI conference on artificial intelligence. vol. 32; 2018. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/11694..

33. Lei L, Tan Y, Dahlenburg G, Xiang W, Zheng K. Dynamic energy dispatch based on Deep Reinforcement Learning in IoT-Driven Smart Isolated Microgrids. IEEE Internet of Things Journal 2021;8:7938-53.

34. Lei L, Xu H, Xiong X, Zheng K, Xiang W, et al. Multiuser resource control with deep reinforcement learning in IoT edge computing. IEEE Internet of Things Journal 2019;6:10119-33.

35. Ohnishi S, Uchibe E, Yamaguchi Y, Nakanishi K, Yasui Y, et al. Constrained deep q-learning gradually approaching ordinary q-learning. Frontiers in neurorobotics 2019;13:103.

36. Peng J, Williams RJ. Incremental multi-step Q-learning. In: machine learning proceedings 1994. Elsevier; 1994. pp. 226-32.

37. Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature 2015;518:529-33.

38. Lei L, Tan Y, Zheng K, et al. Deep reinforcement learning for autonomous internet of things: model, applications and challenges. IEEE Communications Surveys Tutorials 2020;22:1722-60.

39. Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double q-learning. In: proceedings of the AAAI conference on artificial intelligence. vol. 30; 2016. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/10295..

40. Schaul T, Quan J, Antonoglou I, Silver D. Prioritized experience replay. arXiv preprint arXiv:151105952 2015. Available from: https://arxiv.org/abs/1511.05952..

41. Gu S, Lillicrap TP, Ghahramani Z, Turner RE, Levine S. Q-Prop: sample-efficient policy gradient with an off-policy critic. CoRR 2016;abs/1611.02247. Available from: http://arxiv.org/abs/1611.02247..

42. Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Dy J, Krause A, editors. Proceedings of the 35th International Conference on Machine Learning. vol. 80 of Proceedings of Machine Learning Research. PMLR; 2018. pp. 1861–70. Available from: https://proceedings.mlr.press/v80/haarnoja18b.html..

43. Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, et al. Asynchronous methods for deep reinforcement learning. In: Balcan MF, Weinberger KQ, editors. Proceedings of The 33rd International Conference on Machine Learning. vol. 48 of Proceedings of Machine Learning Research. New York, New York, USA: PMLR; 2016. pp. 1928–37. Available from: https://proceedings.mlr.press/v48/mniha16.html..

44. Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning. arXiv preprint arXiv: 150902971 2015. Available from: https://arxiv.org/abs/1509.02971..

45. Barth-Maron G, Hoffman MW, Budden D, et al. Distributed distributional deterministic policy gradients. CoRR 2018;abs/1804.08617. Available from: http://arxiv.org/abs/1804.08617..

46. Fujimoto S, van Hoof H, Meger D. Addressing function approximation error in actor-critic methods. In: Dy J, Krause A, editors. Proceedings of the 35th International Conference on Machine Learning. vol. 80 of Proceedings of Machine Learning Research. PMLR; 2018. pp. 1587–96. Available from: https://proceedings.mlr.press/v80/fujimoto18a.html..

47. Schulman J, Levine S, Abbeel P, Jordan M, Moritz P. Trust region policy optimization. In: Bach F, Blei D, editors. Proceedings of the 32nd International Conference on Machine Learning. vol. 37 of Proceedings of Machine Learning Research. Lille, France: PMLR; 2015. pp. 1889–97. Available from: https://proceedings.mlr.press/v37/schulman15.html..

48. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. CoRR 2017;abs/1707.06347. Available from: http://arxiv.org/abs/1707.06347..

49. Zhu P, Li X, Poupart P. On improving deep reinforcement learning for POMDPs. CoRR 2017;abs/1704.07978. Available from: http://arxiv.org/abs/1704.07978..

50. Hausknecht M, Stone P. Deep recurrent q-learning for partially observable mdps. In: 2015 aaai fall symposium series; 2015. Available from: https://www.aaai.org/ocs/index.php/FSS/FSS15/paper/viewPaper/11673..

51. Heess N, Hunt JJ, Lillicrap TP, Silver D. Memory-based control with recurrent neural networks. CoRR 2015;abs/1512.04455. Available from: http://arxiv.org/abs/1512.04455..

52. Foerster J, Nardelli N, Farquhar G, et al. Stabilising experience replay for deep multi-agent reinforcement learning. In: Precup D, Teh YW, editors. Proceedings of the 34th International Conference on Machine Learning. vol. 70 of Proceedings of Machine Learning Research. PMLR; 2017. pp. 1146–55. Available from: https://proceedings.mlr.press/v70/foerster17b.html..

53. Van der Pol E, Oliehoek FA. Coordinated deep reinforcement learners for traffic light control. Proceedings of learning, inference and control of multi-agent systems (at NIPS 2016) 2016. Available from: https://www.elisevanderpol.nl/papers/vanderpolNIPSMALIC2016.pdf..

54. Foerster J, Farquhar G, Afouras T, Nardelli N, Whiteson S. Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 32; 2018. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/11794..

55. Lowe R, Wu Y, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments. CoRR 2017;abs/1706.02275. Available from: http://arxiv.org/abs/1706.02275..

56. Nadiger C, Kumar A, Abdelhak S. Federated Reinforcement Learning for Fast Personalization. In: 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE) 2019. pp. 123-27.

57. Liu B, Wang L, Liu M, Xu C. Lifelong federated reinforcement learning: a learning architecture for navigation in cloud robotic systems. CoRR 2019;abs/1901.06455. Available from: http://arxiv.org/abs/1901.06455..

58. Ren J, Wang H, Hou T, Zheng S, Tang C. Federated learning-based computation offloading optimization in edge computing-supported internet of things. IEEE Access 2019;7:69194-201.

59. Wang X, Wang C, Li X, Leung VCM, Taleb T. Federated deep reinforcement learning for internet of things with decentralized cooperative edge caching. IEEE Internet of Things Journal 2020;7:9441-55.

60. Chen J, Monga R, Bengio S, Józefowicz R. Revisiting distributed synchronous SGD. CoRR 2016;abs/1604.00981. Available from: http://arxiv.org/abs/1604.00981..

61. Mnih V, Badia AP, Mirza M, et al. Asynchronous methods for deep reinforcement learning. In: Balcan MF, Weinberger KQ, editors. Proceedings of The 33rd International Conference on Machine Learning. vol. 48 of Proceedings of Machine Learning Research. New York, New York, USA: PMLR; 2016. pp. 1928–37. Available from: https://proceedings.mlr.press/v48/mniha16.html..

62. Espeholt L, Soyer H, Munos R, et al. IMPALA: Scalable distributed deep-RL with importance weighted actor- learner architectures. In: Dy J, Krause A, editors. Proceedings of the 35th International Conference on Machine Learning. vol. 80 of Proceedings of Machine Learning Research. PMLR; 2018. pp. 1407–16. Available from: http://proceedings.mlr.press/v80/espeholt18a.html..

63. Horgan D, Quan J, Budden D, et al. Distributed prioritized experience replay. CoRR 2018;abs/1803.00933. Available from: http://arxiv.org/abs/1803.00933..

64. Liu T, Tian B, Ai Y, et al. Parallel reinforcement learning: a framework and case study. IEEE/CAA Journal of Automatica Sinica 2018;5:827-35.

65. Zhuo HH, Feng W, Xu Q, Yang Q, Lin Y. Federated reinforcement learning. CoRR 2019;abs/1901.08277. Available from: http://arxiv.org/abs/1901.08277..

66. Canese L, Cardarilli GC, Di Nunzio L, et al. Multi-agent reinforcement learning: a review of challenges and applications. Applied Sciences 2021;11:4948. Available from: https://doi.org/10.3390/app11114948..

67. Busoniu L, Babuska R, De Schutter B. A Comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 2008;38:156-72.

68. Zhang K, Yang Z, Başar T. Multi-agent reinforcement learning: a selective overview of theories and algorithms. Handbook of Rein forcement Learning and Control 2021:321-84.

69. Stone P, Veloso M. Multiagent systems: a survey from a machine learning perspective. Autonomous Robots 2000;8:345-83.

70. Szepesvári C, Littman ML. A unified analysis of value-function-based reinforcement-learning algorithms. Neural computation 1999;11:2017-60.

71. Littman ML. Value-function reinforcement learning in markov games. Cognitive systems research 2001;2:55-66.

72. Tan M. Multi-agent reinforcement learning: independent vs. cooperative agents. In: proceedings of the tenth international conference on machine learning 1993. pp. 330-37.

73. Lauer M, Riedmiller M. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: In Proceedings of the Seventeenth International Conference on Machine Learning. Citeseer; 2000. Available from: http://citeseerx.ist.psu.edu/viewdoc/summary..

74. Monahan GE. State of the art—a survey of partially observable Markov decision processes: theory, models, and algorithms. Management science 1982;28:1-16.

75. Oroojlooyjadid A, Hajinezhad D. A review of cooperative multi-agent deep reinforcement learning. CoRR 2019;abs/1908.03963. Available from: http://arxiv.org/abs/1908.03963..

76. Bernstein DS, Givan R, Immerman N, Zilberstein S. The complexity of decentralized control of Markov decision processes. Mathematics of operations research 2002;27:819-40.

77. Omidshafiei S, Pazis J, Amato C, How JP, Vian J. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Precup D, Teh YW, editors. Proceedings of the 34th International Conference on Machine Learning. vol. 70 of Proceedings of Machine Learning Research. PMLR; 2017. pp. 2681–90. Available from: https://proceedings.mlr.press/v70/omidshafiei17a.html..

78. Han Y, Gmytrasiewicz P. Ipomdp-net: A deep neural network for partially observable multi-agent planning using interactive pomdps. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33 2019. pp. 6062-69.

79. Karkus P, Hsu D, Lee WS. QMDP-Net: Deep learning for planning under partial observability; 2017. Available from: https://arxiv.org/abs/1703.06692..

80. Mao W, Zhang K, Miehling E, Başar T. Information state embedding in partially observable cooperative multi-agent reinforcement learning. In: 2020 59th IEEE Conference on Decision and Control (CDC) 2020. pp. 6124-31.

81. Mao H, Zhang Z, Xiao Z, Gong Z. Modelling the dynamic joint policy of teammates with attention multi-agent DDPG. CoRR 2018;abs/1811.07029. Available from: http://arxiv.org/abs/1811.07029..

82. Lee HR, Lee T. Multi-agent reinforcement learning algorithm to solve a partially-observable multi-agent problem in disaster response. European Journal of Operational Research 2021;291:296-308.

83. Sukhbaatar S, szlam a, Fergus R. Learning multiagent communication with backpropagation. In: Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R, editors. Advances in Neural Information Processing Systems. vol. 29. Curran Associates, Inc.; 2016. Available from: https://proceedings.neurips.cc/paper/2016/file/55b1927fdafef39c48e5b73b5d61ea60-Paper.pdf..

84. Foerster JN, Assael YM, de Freitas N, Whiteson S. Learning to communicate with deep multi-agent reinforcement learning. CoRR 2016;abs/1605.06676. Available from: http://arxiv.org/abs/1605.06676..

85. Buşoniu L, Babuška R, De Schutter B. Multi-agent reinforcement learning: an overview. Innovations in multiagent systems and applications 1 2010:183-221.

86. Hu Y, Hua Y, Liu W, Zhu J. Reward shaping based federated reinforcement learning. IEEE Access 2021;9:67259-67.

87. Anwar A, Raychowdhury A. Multi-task federated reinforcement learning with adversaries. CoRR 2021;abs/2103.06473. Available from: https://arxiv.org/abs/2103.06473..

88. Wang X, Han Y, Wang C, et al. In-edge AI: intelligentizing mobile edge computing, caching and communication by federated learning. IEEE Network 2019;33:156-65.

89. Wang X, Li R, Wang C, et al. Attention-weighted federated deep reinforcement learning for device-to-device assisted heterogeneous collaborative edge caching. IEEE Journal on Selected Areas in Communications 2021;39:154-69.

90. Zhang M, Jiang Y, Zheng FC, Bennis M, You X. Cooperative edge caching via federated deep reinforcement learning in Fog-RANs. In: 2021 IEEE International Conference on Communications Workshops (ICC Workshops) 2021. pp. 1-6.

91. Majidi F, Khayyambashi MR, Barekatain B. HFDRL: an intelligent dynamic cooperate cashing method based on hierarchical federated deep reinforcement learning in edge-enabled IoT. IEEE Internet of Things Journal 2021:1-1.

92. Zhao L, Ran Y, Wang H, Wang J, Luo J. Towards cooperative caching for vehicular networks with multi-level federated reinforcement learning. In: ICC 2021 - IEEE International Conference on Communications 2021. pp. 1-6.

93. Zhu Z, Wan S, Fan P, Letaief KB. Federated multi-agent actor-critic learning for age sensitive mobile edge computing. IEEE Internet of Things Journal 2021:1-1.

94. Yu S, Chen X, Zhou Z, Gong X, Wu D. When deep reinforcement learning meets federated learning: intelligent multi-timescale resource management for multi-access edge computing in 5G ultra dense network. arXiv:200910601 [cs] 2020 Sep. ArXiv: 2009.10601. Available from: http://arxiv.org/abs/2009.10601..

95. Tianqing Z, Zhou W, Ye D, Cheng Z, Li J. Resource allocation in IoT edge computing via concurrent federated reinforcement learning. IEEE Internet of Things Journal 2021:1-1.

96. Huang H, Zeng C, Zhao Y, et al. Scalable orchestration of service function chains in NFV-enabled networks: a federated reinforcement learning approach. IEEE Journal on Selected Areas in Communications 2021;39:2558-71.

97. Liu YJ, Feng G, Sun Y, Qin S, Liang YC. Device association for RAN slicing based on hybrid federated deep reinforcement learning. IEEE Transactions on Vehicular Technology 2020;69:15731-45.

98. Wang G, Dang CX, Zhou Z. Measure Contribution of participants in federated learning. In: 2019 IEEE International Conference on Big Data (Big Data) 2019. pp. 2597-604.

99. Cao Y, Lien SY, Liang YC, Chen KC. Federated deep reinforcement learning for user access control in open radio access networks. In: ICC 2021 - IEEE International Conference on Communications 2021. pp. 1-6.

100. Zhang L, Yin H, Zhou Z, Roy S, Sun Y. Enhancing WiFi multiple access performance with federated deep reinforcement learning. In: 2020 IEEE 92nd Vehicular Technology Conference (VTC2020-Fall) 2020. pp. 1-6.

101. Xu M, Peng J, Gupta BB, et al. Multi-agent federated reinforcement learning for secure incentive mechanism in intelligent cyber-physical systems. IEEE Internet of Things Journal 2021:1-1.

102. Zhang X, Peng M, Yan S, Sun Y. Deep-reinforcement-learning-based mode selection and resource allocation for cellular V2X communications. IEEE Internet of Things Journal 2020;7:6380-91.

103. Kwon D, Jeon J, Park S, Kim J, Cho S. Multiagent DDPG-based deep learning for smart ocean federated learning IoT networks. IEEE Internet of Things Journal 2020;7:9895-903.

104. Liang X, Liu Y, Chen T, Liu M, Yang Q. Federated transfer reinforcement learning for autonomous driving. arXiv:191006001 [cs] 2019 Oct. ArXiv: 1910.06001. Available from: http://arxiv.org/abs/1910.06001..

105. Lim HK, Kim JB, Heo JS, Han YH. Federated reinforcement learning for training control policies on multiple IoT devices. Sensors 2020 Mar;20:1359. Available from: https://www.mdpi.com/1424-8220/20/5/1359..

106. Lim HK, Kim JB, Ullah I, Heo JS, Han YH. Federated reinforcement learning acceleration method for precise control of multiple devices. IEEE Access 2021;9:76296-306.

107. Mowla NI, Tran NH, Doh I, Chae K. AFRL: Adaptive federated reinforcement learning for intelligent jamming defense in FANET. Journal of Communications and Networks 2020;22:244-58.

108. Nguyen TG, Phan TV, Hoang DT, Nguyen TN, So-In C. Federated deep reinforcement learning for traffic monitoring in SDN-Based IoT networks. IEEE Transactions on Cognitive Communications and Networking 2021:1-1.

109. Wang X, Garg S, Lin H, et al. Towards accurate anomaly detection in industrial internet-of-things using hierarchical federated learning. IEEE Internet of Things Journal 2021:1-1.

110. Lee S, Choi DH. Federated reinforcement learning for energy management of multiple smart homes with distributed energy resources. IEEE Transactions on Industrial Informatics 2020:1-1.

111. Samet H. The quadtree and related hierarchical data structures. ACM Comput Surv 1984;16:187–260. Available from: https://doi.org/10.1145/356924.356930..

112. Abdel-Aziz MK, Samarakoon S, Perfecto C, Bennis M. Cooperative perception in vehicular networks using multi-agent reinforcement learning. In: 2020 54th Asilomar Conference on Signals, Systems, and Computers 2020. pp. 408-12.

113. Wang H, Kaplan Z, Niu D, Li B. Optimizing federated learning on Non-IID data with reinforcement learning. In: IEEE INFOCOM 2020 - IEEE Conference on Computer Communications. Toronto, ON, Canada: IEEE; 2020. pp. 1698–707. Available from: https://ieeexplore.ieee.org/document/9155494/..

114. Zhang P, Gan P, Aujla GS, Batth RS. Reinforcement learning for edge device selection using social attribute perception in industry 4.0. IEEE Internet of Things Journal 2021:1-1.

115. Zhan Y, Li P, Leijie W, Guo S. L4L: experience-driven computational resource control in federated learning. IEEE Transactions on Computers 2021:1-1.

116. Dong Y, Gan P, Aujla GS, Zhang P. RA-RL: reputation-aware edge device selection method based on reinforcement learning. In: 2021 IEEE 22nd International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM) 2021. pp. 348-53.

117. Sahu AK, Li T, Sanjabi M, et al. On the convergence of federated optimization in heterogeneous networks. CoRR 2018;abs/1812.06127. Available from: http://arxiv.org/abs/1812.06127..

118. Chen M, Poor HV, Saad W, Cui S. Convergence time optimization for federated learning over wireless networks. IEEE Transactions on Wireless Communications 2021;20:2457-71.

119. Li X, Huang K, Yang W, Wang S, Zhang Z. On the convergence of fedAvg on Non-IID data; 2020. Available from: https://arxiv.org/abs/1907.02189?context=stat.ML..

120. Bonawitz KA, Eichner H, Grieskamp W, et al. Towards federated learning at scale: system design. CoRR 2019;abs/1902.01046. Available from: http://arxiv.org/abs/1902.01046..

121. Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature 2015;518:529–33. Available from: https://doi.org/10.1038/nature14236..

122. Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning; 2019. Available from: https://arxiv.org/abs/1509.02971..

123. Lyu L, Yu H, Yang Q. Threats to federated learning: a survey. CoRR 2020;abs/2003.02133. Available from: https://arxiv.org/abs/2003.02133..

124. Fung C, Yoon CJM, Beschastnikh I. Mitigating sybils in federated learning poisoning. CoRR 2018;abs/1808.04866. Available from: http://arxiv.org/abs/1808.04866..

125. Anwar A, Raychowdhury A. Multi-task federated reinforcement learning with adversaries 2021.

126. Zhu L, Liu Z, Han S. Deep leakage from gradients. CoRR 2019;abs/1906.08935. Available from: http://arxiv.org/abs/1906.08935..

127. Nishio T, Yonetani R. Client Selection for federated learning with heterogeneous resources in mobile edge. In: ICC 2019-2019 IEEE International Conference on Communications (ICC) 2019. pp. 1-7.

128. Yang T, Andrew G, Eichner H, et al. Applied federated learning: improving google keyboard query suggestions. CoRR 2018;abs/1812.02903. Available from: http://arxiv.org/abs/1812.02903..

129. Yu H, Liu Z, Liu Y, et al. A fairness-aware incentive scheme for federated learning. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. AIES ’20. New York, NY, USA: Association for Computing Machinery; 2020. p. 393–399. Available from: https://doi.org/10.1145/3375627.3375840..