REFERENCES

1. Mourtzis D. Simulation in the design and operation of manufacturing systems: state of the art and new trends. Int J Prod Res 2020;58:1927-49.

2. Mourtzis D. Design and operation of production networks for mass personalization in the era of cloud technology Amsterdam: Elsevier; 2021. pp. 1-393.

3. Atif M, Ahmad R, Ahmad W, Zhao L, Rodrigues JJ. UAV-assisted wireless localization for search and rescue. IEEE Syst J 2021;15:3261-72.

4. Zhao L, Liu Y, Peng Q, Zhao L. A dual aircraft maneuver formation controller for mav/uav based on the hybrid intelligent agent. Drones 2023;7:282.

5. Mourtzis D, Angelopoulos J, Panopoulos N. Unmanned aerial vehicle (UAV) manipulation assisted by augmented reality (AR): the case of a drone. IFAC-PapersOnLine 2022;55:983-8.

6. Walker O, Vanegas F, Gonzalez F, et al. A deep reinforcement learning framework for UAV navigation in indoor environments. 2019 IEEE Aerospace Conference 2019:1-14.

7. Wang Q, Zhang A, Qi L. Three-dimensional path planning for UAV based on improved PSO algorithm. In: The 26th Chinese Control and Decision Conference (2014 CCDC). IEEE; 2014;3981-5.

8. Yao P, Xie Z, Ren P. Optimal UAV route planning for coverage search of stationary target in river. IEEE Trans Contr Syst Technol 2017;27:822-9.

9. Shin J, Bang H, Morlier J. UAV path planning under dynamic threats using an improved PSO algorithm. Int J Aerospace Eng 2020;2020:1-17.

10. Wen C, Qin L, Zhu Q, Wang C, Li J. Three-dimensional indoor mobile mapping with fusion oftwo-dimensional laser scanner and RGB-D camera data. IEEE Geosci Remote Sensing Lett 2013;11:843-7.

11. Mu B, Giamou M, Paull, et al. Information-based active SLAM via topological feature graphs. In: 2016 IEEE 55th Conference on decision and control (Cdc). IEEE; 2016. pp. 5583-90.

12. Weiss S, Scaramuzza D, Siegwart R. Monocular-SLAM–based navigation for autonomous micro helicopters in GPS-denied environments. J Field Robot 2011;28:854-74.

13. Silver D, Huang A, Maddison CJ, et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016;529:484-9.

14. Levine S, Finn C, Darrell T, Abbeel P. End-to-end training of deep visuomotor policies. J Mach Learn Res 2016;17:1334-73.

15. Levine S, Pastor P, Krizhevsky A, Ibarz J, Quillen D. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. Ind Robot 2018;37:421-36.

16. Yan C, Xiang X. A path planning algorithm for uav based on improved q-learning. In: 2018 2nd international conference on robotics and automation sciences (ICRAS). IEEE; 2018.;1-5.

17. Bouhamed O, Ghazzai H, Besbes H, Massoud Y. Autonomous UAV navigation: A DDPG-based deep reinforcement learning approach. In: 2020 IEEE International Symposium on circuits and systems (ISCAS). IEEE; 2020. pp. 1-5.

18. Masson W, Ranchod P, Konidaris G. Reinforcement learning with parameterized actions. In: Thirtieth AAAI Conference on Artificial Intelligence; 2016.

19. Hausknecht M, Stone P. Deep reinforcement learning in parameterized action space. In: Proceedings of the International Conference on Learning Representations. 2016.

20. Xiong J, Wang Q, Yang Z, et al. Parametrized deep q-networks learning: reinforcement learning with discrete-continuous hybrid action space. arXiv preprint arXiv:181006394. 2018.

21. Bester CJ, James SD, Konidaris GD. Multi-pass q-networks for deep reinforcement learning with parameterised action spaces. arXiv preprint arXiv:190504388. 2019.

22. Wang W, Luo X, Li Y, Xie S. Unmanned surface vessel obstacle avoidance with prior knowledge-based reward shaping. Concurrency Computat Pract Exper 2021;33:e6110.

23. Okudo T, Yamada S. Subgoal-based reward shaping to improve efficiency in reinforcement learning. IEEE Access 2021;9:97557-68.

24. Burda Y, Edwards H, Storkey A, Klimov O. Exploration by random network distillation. arXiv preprint arXiv:181012894. 2018.

25. Badia AP, Sprechmann P, Vitvitskyi A, et al. Never give up: learning directed exploration strategies. arXiv preprint arXiv:200206038. 2020.

26. Andrychowicz M, Wolski F, Ray A, et al. Hindsight experience replay. Adv Neural Inf Process Syst 2017:30.

27. Lanka S, Wu T. Archer: aggressive rewards to counter bias in hindsight experience replay. arXiv preprint arXiv:180902070. 2018.

28. Schramm L, Deng Y, Granados E, Boularias A. USHER: unbiased sampling for hindsight experience replay. arXiv preprint arXiv:220701115. 2022. Available from: https://proceedings.mlr.press/v205/schramm23a/schramm23a.pdf [Last accessed on 15 Jun 2023].

29. Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. nature 2015;518:529-33.

30. Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971 2015.

31. Liu C, Van Kampen EJ. HER-PDQN: a reinforcement learning approach for uav navigation with hybrid action spaces and sparse rewards. In: AIAA SCITECH 2022 Forum; 2022;0793.