REFERENCES

1. Maier-Hein L, Eisenmann M, Sarikaya D, et al. Surgical data science - from concepts toward clinical translation. Med Image Anal 2022;76:102306.

2. Ding H, Zhang J, Kazanzides P, Wu JY, Unberath M. CaRTS: causality-driven robot tool segmentation from vision and kinematics data. In: Wang L, Dou Q, Fletcher PT, Speidel S, Li S, editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2022. Cham: Springer; 2022. pp. 387-98.

3. Kenngott HG, Wagner M, Preukschas AA, Müller-Stich BP. [Intelligent operating room suite: from passive medical devices to the self-thinking cognitive surgical assistant]. Chirurg 2016;87:1033-8.

4. Killeen BD, Gao C, Oguine KJ, et al. An autonomous X-ray image acquisition and interpretation system for assisting percutaneous pelvic fracture fixation. Int J Comput Assist Radiol Surg 2023;18:1201-8.

5. Gao C, Killeen BD, Hu Y, et al. Synthetic data accelerates the development of generalizable learning-based algorithms for X-ray image analysis. Nat Mach Intell 2023;5:294-308.

6. Madani A, Namazi B, Altieri MS, et al. Artificial intelligence for intraoperative guidance: using semantic segmentation to identify surgical anatomy during laparoscopic cholecystectomy. Ann Surg 2022;276:363-9.

7. Shu H, Liang R, Li Z, et al. Twin-S: a digital twin for skull base surgery. Int J Comput Assist Radiol Surg 2023;18:1077-84.

8. Killeen BD, Winter J, Gu W, et al. Mixed reality interfaces for achieving desired views with robotic X-ray systems. Comput Methods Biomech Biomed Eng Imaging Vis 2023;11:1130-5.

9. Killeen BD, Chaudhary S, Osgood G, Unberath M. Take a shot! Natural language control of intelligent robotic X-ray systems in surgery. Int J Comput Assist Radiol Surg 2024;19:1165-73.

10. Kausch L, Thomas S, Kunze H, et al. C-arm positioning for standard projections during spinal implant placement. Med Image Anal 2022;81:102557.

11. Killeen BD, Zhang H, Mangulabnan J, et al. Pelphix: surgical phase recognition from X-ray images in percutaneous pelvic fixation. In: Greenspan H, Madabhushi A, Mousavi P, Salcudean S, Duncan J, Syeda-mahmood T, Taylor R, editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2023. Cham: Springer; 2023. pp. 133-43.

12. Garrow CR, Kowalewski KF, Li L, et al. Machine learning for surgical phase recognition: a systematic review. Ann Surg 2021;273:684-93.

13. Weede O, Dittrich F, Worn H, et al. Workflow analysis and surgical phase recognition in minimally invasive surgery. In: 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO); 2012 Dec 11-14; Guangzhou, China. IEEE; 2012. pp. 1080-74.

14. Kiyasseh D, Ma R, Haque TF, et al. A vision transformer for decoding surgeon activity from surgical videos. Nat Biomed Eng 2023;7:780-96.

15. Ban Y, Eckhoff JA, Ward TM, et al. Concept graph neural networks for surgical video understanding. IEEE Trans Med Imaging 2024;43:264-74.

16. Czempiel T, Paschali M, Keicher M, et al. TeCNO: surgical phase recognition with multi-stage temporal convolutional networks. In: Martel AL, Abolmaesumi P, Stoyanov D, Mateus D, Zuluaga MA, Zhou SK, Racoceanu D, Joskowicz L, editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2020. Cham: Springer; 2020. pp. 343-52.

17. Guédon ACP, Meij SEP, Osman KNMMH, et al. Deep learning for surgical phase recognition using endoscopic videos. Surg Endosc 2021;35:6150-7.

18. Murali A, Alapatt D, Mascagni P, et al. Encoding surgical videos as latent spatiotemporal graphs for object and anatomy-driven reasoning. In: Greenspan H, et al., editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2023. Cham: Springer; 2023. pp. 647-57.

19. Zhang D, Wang R, Lo B. Surgical gesture recognition based on bidirectional multi-layer independently RNN with explainable spatial feature extraction. In: 2021 IEEE International Conference on Robotics and Automation (ICRA); 2021 May 30 - Jun 5; Xi’an, China. IEEE; 2021. pp. 1350-6.

20. DiPietro R, Ahmidi N, Malpani A, et al. Segmenting and classifying activities in robot-assisted surgery with recurrent neural networks. Int J Comput Assist Radiol Surg 2019;14:2005-20.

21. Dipietro R, Hager GD. Automated surgical activity recognition with one labeled sequence. In: Shen D, et al., editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2019. Cham: Springer; 2019. pp. 458-66.

22. Reiley CE, Lin HC, Yuh DD, Hager GD. Review of methods for objective surgical skill evaluation. Surg Endosc 2011;25:356-66.

23. Lam K, Chen J, Wang Z, et al. Machine learning for technical skill assessment in surgery: a systematic review. NPJ Digit Med 2022;5:24.

24. Alapatt D, Murali A, Srivastav V, Mascagni P, Consortium A, Padoy N. Jumpstarting surgical computer vision. arXiv. [Preprint.] Dec 10, 2023 [accessed 2024 Jul 2]. Available from: https://arxiv.org/abs/2312.05968.

25. Ramesh S, Srivastav V, Alapatt D, et al. Dissecting self-supervised learning methods for surgical computer vision. Med Image Anal 2023;88:102844.

26. Geirhos R, Rubisch P, Michaelis C, Bethge M, Wichmann FA, Brendel W. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv. [Preprint.] Nov 29, 2018 [accessed 2024 Jul 2]. Available from: https://arxiv.org/abs/1811.12231.

27. Glocker B, Jones C, Roschewitz M, Winzeck S. Risk of bias in chest radiography deep learning foundation models. Radiol Artif Intell 2023;5:e230060.

28. Geirhos R, Jacobsen J, Michaelis C, et al. Shortcut learning in deep neural networks. Nat Mach Intell 2020;2:665-73.

29. Wen C, Qian J, Lin J, Teng J, Jayaraman D, Gao Y. Fighting fire with fire: avoiding dnn shortcuts through priming. Available from: https://proceedings.mlr.press/v162/wen22d.html. [Last accessed on 2 Jul 2024]

30. Olah C, Satyanarayan A, Johnson I, et al. The building blocks of interpretability. Distill 2018;3:e10.

31. Ahmed H, Devoto L. The potential of a digital twin in surgery. Surg Innov 2021;28:509-10.

32. Bjelland Ø, Rasheed B, Schaathun HG, et al. Toward a digital twin for arthroscopic knee surgery: a systematic review. IEEE Access 2022;10:45029-52.

33. Erol T, Mendi AF, Doğan D. The digital twin revolution in healthcare. In: 2020 4th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT); 2020 Oct 22-24; Istanbul, Turkey. IEEE; 2020. pp. 1-7.

34. Representations of geometry for computer graphics. Available from: https://graphics.stanford.edu/courses/cs233-24-winter-v1/ReferencedPapers/60082881-Presentations-of-Geometry-for-Computer-Graphics.pdf. [Last accessed on 2 Jul 2024].

35. Levoy M, Whitted T. The use of points as a display primitive. 2000. Available from: https://api.semanticscholar.org/CorpusID:12672240. [Last accessed on 2 Jul 2024].

36. Botsch M, Kobbelt L, Pauly M, Alliez P, Levy B. Polygon mesh processing. A K Peters/CRC Press; 2010. Available from: http://www.crcpress.com/product/isbn/9781568814261. [Last accessed on 2 Jul 2024]

37. Jolliffe IT, Cadima J. Principal component analysis: a review and recent developments. Philos Trans A Math Phys Eng Sci 2016;374:20150202.

38. Blanz V, Vetter T. A morphable model for the synthesis of 3D faces. In: Whitton MC, editor. Seminal graphics papers: pushing the boundaries. New York: ACM; 2023. pp. 157-64.

39. Edwards GJ, Taylor CJ, Cootes TF. Interpreting face images using active appearance models. In: Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition; 1998 Apr 14-16; Nara, Japan. IEEE; 1998. pp. 300-5.

40. Karamizadeh S, Abdullah SM, Manaf AA, Zamani M, Hooman A. An overview of principal component analysis. J Signal Inf Process 2013;4:173-5.

41. Liu X, Killeen BD, Sinha A, et al. Neighborhood normalization for robust geometric feature learning. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2021 Jun 20-25; Nashville, TN, USA. IEEE; 2021. pp. 13049-58.

42. Drenkow N, Sani N, Shpitser I, Unberath M. A systematic review of robustness in deep learning for computer vision: mind the gap? arXiv. [Preprint.] Dec 1, 2021 [accessed 2024 Jul 2]. Available from: https://arxiv.org/abs/2112.00639.

43. Osher S, Fedkiw R. Level set methods and dynamic implicit surfaces. Appl Math Sci 2004;57:B15.

44. Salomon D. Curves and surfaces for computer graphics. New York: Springer. 2006.

45. Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators. Neural Netw 1989;2:359-66.

46. Michalkiewicz M, Pontes JK, Jack D, Baktashmotlagh M, Eriksson A. Deep level sets: implicit surface representations for 3d shape inference. arXiv. [Preprint.] Jan 21, 2019 [accessed 2024 Jul 2]. Available from: https://arxiv.org/abs/1901.06802.

47. Park JJ, Florence P, Straub J, Newcombe R, Lovegrove S. Deepsdf: learning continuous signed distance functions for shape representation. arXiv. [Preprint.] Jan 16, 2019 [accessed 2024 Jul 2]. Available from: https://arxiv.org/abs/1901.05103.

48. Mildenhall B, Srinivasan PP, Tancik M, Barron JT, Ramamoorthi R, Ng R. NeRF: representing scenes as neural radiance fields for view synthesis. Commun ACM 2022;65:99-106.

49. Li Z, Müller T, Evans A, et al. Neuralangelo: high-fidelity neural surface reconstruction. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023 Jun 17-24; Vancouver, BC, Canada. IEEE; 2023. pp. 8456-65.

50. Allan M, Shvets A, Kurmann T, et al. 2017 robotic instrument segmentation challenge. arXiv. [Preprint.] Feb 21, 2019 [accessed 2024 Jul 2]. Available from: https://arxiv.org/abs/1902.06426.

51. Allan M, Kondo S, Bodenstedt S, et al. 2018 robotic scene segmentation challenge. arXiv. [Preprint.] Aug 3, 2020 [accessed 2024 Jul 2]. Available from: https://arxiv.org/abs/2001.11190.

52. Psychogyios D, Colleoni E, Van Amsterdam B, et al. Sar-rarp50: segmentation of surgical instrumentation and action recognition on robot-assisted radical prostatectomy challenge. arXiv. [Preprint.] Jan 23, 2024 [accessed 2024 Jul 2]. Available from: https://arxiv.org/abs/2401.00496.

53. Grammatikopoulou M, Flouty E, Kadkhodamohammadi A, et al. CaDIS: Cataract dataset for surgical RGB-image segmentation. Med Image Anal 2021;71:102053.

54. Cartucho J, Weld A, Tukra S, et al. SurgT challenge: benchmark of soft-tissue trackers for robotic surgery. Med Image Anal 2024;91:102985.

55. Schmidt A, Mohareri O, DiMaio S, Salcudean SE. Surgical tattoos in infrared: a dataset for quantifying tissue tracking and mapping. arXiv. [Preprint.] Feb 29, 2024 [accessed 2024 Jul 2]. Available from: https://arxiv.org/abs/2309.16782.

56. Roß T, Reinke A, Full PM, et al. Comparative validation of multi-instance instrument segmentation in endoscopy: Results of the ROBUST-MIS 2019 challenge. Med Image Anal 2021;70:101920.

57. Segstrong-C: segmenting surgical tools robustly on non-adversarial generated corruptions - an EndoVis’24 challenge. [Preprint.] Jul 16, 2024 [accessed 2024 Jul 18]. Available from: https://arxiv.org/abs/2407.11906.

58. Qin F, Lin S, Li Y, Bly RA, Moe KS, Hannaford B. Towards better surgical instrument segmentation in endoscopic vision: multi-angle feature aggregation and contour supervision. IEEE Robot Autom Lett 2020;5:6639-46.

59. Hong WY, Kao CL, Kuo YH, Wang JR, Chang WL, Shih CS. Cholecseg8k: a semantic segmentation dataset for laparoscopic cholecystectomy based on cholec80. arXiv. [Preprint.] Dec 23, 2020 [accessed 2024 Jul 2]. Available from: https://arxiv.org/abs/2012.12453.

60. Jha D, Ali S, Emanuelsen K, et al. Kvasir-instrument: diagnostic and therapeutic tool segmentation dataset in gastrointestinal endoscopy. In: Lokoč J, et al., editors. MultiMedia Modeling. Cham: Springer; 2021. pp. 218-29.

61. Wang Z, Lu B, Long Y, et al. AutoLaparo: a new dataset of integrated multi-tasks for image-guided surgical automation in laparoscopic hysterectomy. In: Wang L, Dou Q, Fletcher PT, Speidel S, Li S, editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2022. Cham: Springer; 2022. pp. 486-96.

62. Lin TY, Maire M, Belongie S, et al. Microsoft COCO: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T, editors. Computer Vision - ECCV 2014. Cham: Springer; 2014. pp. 740-55.

63. Shao S, Li Z, Zhang T, et al. Objects365: a large-scale, high-quality dataset for object detection. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV); 2019 Oct 27 - Nov 2; Seoul, Korea (South). IEEE; 2019. pp. 8429-38.

64. Zia A, Bhattacharyya K, Liu X, et al. Surgical tool classification and localization: results and methods from the miccai 2022 surgtoolloc challenge. arXiv. [Preprint.] May 31, 2023 [accessed 2024 Jul 2]. Available from: https://arxiv.org/abs/2305.07152.

65. Pfeiffer M, Funke I, Robu MR, et al. Generating large labeled data sets for laparoscopic image processing tasks using unpaired image-to-image translation. In: Shen D, et al., editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2019. Cham: Springer; 2019. pp. 119-27.

66. Kirillov A, Mintun E, Ravi N, et al. Segment anything. arXiv. [Preprint.] Apr 5, 2023 [accessed 2024 Jul 2]. Available from: https://arxiv.org/abs/2304.02643.

67. Ozyoruk KB, Gokceler GI, Bobrow TL, et al. EndoSLAM dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos. Med Image Anal 2021;71:102058.

68. Allan M, Mcleod J, Wang C, et al. Stereo correspondence and reconstruction of endoscopic data challenge. arXiv. [Preprint.] Jan 28, 2021 [accessed 2024 Jul 2]. Available from: https://arxiv.org/abs/2101.01133.

69. Hamlyn centre laparoscopic/endoscopic video datasets. Available from: https://hamlyn.doc.ic.ac.uk/vision/. [Last accessed on 2 Jul 2024].

70. Recasens D, Lamarca J, Fácil JM, Montiel JMM, Civera J. Endo-depth-and-motion: reconstruction and tracking in endoscopic videos using depth networks and photometric constraints. IEEE Robot Autom Lett 2021;6:7225-32.

71. Ali S, Pandey AK. ArthroNet: a monocular depth estimation technique with 3D segmented maps for knee arthroscopy. Intell Med 2023;3:129-38.

72. Masuda K, Shimizu T, Nakazawa T, Edamoto Y. Registration between 2D and 3D ultrasound images to track liver blood vessel movement. Curr Med Imaging 2023;19:1133-43.

73. Bobrow TL, Golhar M, Vijayan R, Akshintala VS, Garcia JR, Durr NJ. Colonoscopy 3D video dataset with paired depth from 2D-3D registration. Med Image Anal 2023;90:102956.

74. Lin B, Sun Y, Sanchez JE, Qian X. Efficient vessel feature detection for endoscopic image analysis. IEEE Trans Biomed Eng 2015;62:1141-50.

75. JIGSAWS: the JHU-ISI gesture and skill assessment working set: JHU-ISI Gesture and skill assessment working set (JIGSAWS). Available from: https://cirl.lcsr.jhu.edu/research/hmm/datasets/jigsaws_release/. [Last accessed on 2 Jul 2024].

76. Hein J, Cavalcanti N, Suter D, et al. Next-generation surgical navigation: marker-less multi-view 6dof pose estimation of surgical instruments. arXiv. [Preprint.] Dec 22, 2023 [accessed 2024 Jul 2]. Available from: https://arxiv.org/abs/2305.03535.

77. Hasan MK, Calvet L, Rabbani N, Bartoli A. Detection, segmentation, and 3D pose estimation of surgical tools using convolutional neural networks and algebraic geometry. Med Image Anal 2021;70:101994.

78. 3dStool. Available from: https://github.com/SpyrosSou/3dStool. [Last accessed on 2 Jul 2024].

79. Greene N, Luo W, Kazanzides P. dvpose: automated data collection and dataset for 6d pose estimation of robotic surgical instruments. In: 2023 International Symposium on Medical Robotics (ISMR); 2023 Apr 19-21; Atlanta, GA, USA. IEEE; 2023. pp. 1-7.

80. Fisher R. Edinburgh simulated surgical tools dataset (RGBD). 2022. Available from: https://groups.inf.ed.ac.uk/vision/DATASETS/SURGICALTOOLS/. [Last accessed on 2 Jul 2024].

81. 6-dof pose estimation of surgical instruments. 2022. Available from: https://www.kaggle.com/datasets/juanantoniobarragan/6-dof-pose-estimation-of-surgical-instruments. [Last accessed on 2 Jul 2024].

82. Munawar A, Wu JY, Fischer GS, Taylor RH, Kazanzides P. Open simulation environment for learning and practice of robot-assisted surgical suturing. IEEE Robot Autom Lett 2022;7:3843-50.

83. Wang R, Ktistakis S, Zhang S, Meboldt M, Lohmeyer Q. POV-surgery: a dataset for egocentric hand and tool pose estimation during surgical activities. In: Greenspan H, et al., editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2023. Cham: Springer; 2023. pp. 440-50.

84. Kügler D, Sehring J, Stefanov A, et al. i3PosNet: instrument pose estimation from X-ray in temporal bone surgery. Int J Comput Assist Radiol Surg 2020;15:1137-45.

85. Zhang J, Hu J. Image segmentation based on 2d otsu method with histogram analysis. In: 2008 International Conference on Computer Science and Software Engineering; 2008 Dec 12-14; Wuhan, China. IEEE; 2008. pp. 105-8.

86. Pham DL, Prince JL. An adaptive fuzzy C-means algorithm for image segmentation in the presence of intensity inhomogeneities. Pattern Recognit Lett 1999;20:57-68.

87. Lin C, Chen C. Image segmentation based on edge detection and region growing for thinprep-cervical smear. Int J Patt Recogn Artif Intell 2010;24:1061-89.

88. Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05); 2005 Jun 20-25; San Diego, CA, USA. IEEE; 2005. pp. 886-93.

89. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D. Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 2010;32:1627-45.

90. Viola P, Jones M. Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001; 2001 Dec 8-14; Kauai, HI, USA. IEEE; 2001. p. 1.

91. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2015. pp. 3431-40.

92. Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF, editors. Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015. Cham: Springer; 2015. pp. 234-41.

93. Chen L, Zhu Y, Papandreou G, Schroff F, Adam H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y, editors. Computer Vision - ECCV 2018. Cham: Springer; 2018. pp. 833-51.

94. Zhao H, Shi J, Qi X, Wang X, Jia J. Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017 Jul 21-26; Honolulu, HI, USA. IEEE; 2017. pp. 2881-90.

95. Seenivasan L, Mitheran S, Islam M, Ren H. Global-reasoned multi-task learning model for surgical scene understanding. IEEE Robot Autom Lett 2022;7:3858-65.

96. Girshick R. Fast r-cnn. In: 2015 IEEE International Conference on Computer Vision (ICCV); 2015 Dec 7-13; Santiago, Chile. IEEE; 2015. pp. 1440-8.

97. Ren S, He K, Girshick R, Sun J. Faster r-cnn: towards real-time object detection with region proposal networks. Available from: https://proceedings.neurips.cc/paper/2015/hash/14bfa6bb14875e45bba028a21ed38046-Abstract.html. [Last accessed on 2 Jul 2024]

98. Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition; 2014 Jun 23-28; Columbus, OH, USA. IEEE; 2014. pp. 580-7.

99. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016 Jun 27-30; Las Vegas, NV, USA. IEEE; 2016. pp. 779-88.

100. Liu W, Anguelov D, Erhan D, et al. SSD: single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M, editors. Computer Vision - ECCV 2016. Cham: Springer; 2016. pp. 21-37.

101. Lu X, Li B, Yue Y, Li Q, Yan J. Grid R-CNN. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019 Jun 15-20; Long Beach, CA, USA. IEEE; 2019. pp. 7363-72.

102. Zhang H, Chang H, Ma B, Wang N, Chen X. Dynamic R-CNN: towards high quality object detection via dynamic training. In: Vedaldi A, Bischof H, Brox T, Frahm J, editors. Computer Vision - ECCV 2020. Cham: Springer; 2020. pp. 260-75.

103. Wu Y, Chen Y, Yuan L, et al. Rethinking classification and localization for object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020 Jun 13-19; Seattle, WA, USA. IEEE; 2020. pp. 10186-92.

104. Lin TY, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV); 2017 Oct 22-29; Venice, Italy. IEEE; 2017. pp. 2980-8.

105. Law H, Deng J. CornerNet: detecting objects as paired keypoints. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y, editors. Computer Vision - ECCV 2018. Cham: Springer; 2018. pp. 765-81.

106. Zhou X, Wang D, Krähenbühl P. Objects as points. arXiv. [Preprint.] Apr 16, 2019 [accessed 2024 Jul 3]. Available from: https://arxiv.org/abs/1904.07850.

107. Yang Z, Liu S, Hu H, Wang L, Lin S. Reppoints: point set representation for object detection. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV); 2019 Oct 27 - Nov 2; Seoul, Korea (South). IEEE; 2019. pp. 9657-66.

108. Tian Z, Shen C, Chen H, He T. FCOS: a simple and strong anchor-free object detector. IEEE Trans Pattern Anal Mach Intell 2022;44:1922-33.

109. He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV); 2017 Oct 22-29; Venice, Italy. IEEE; 2017. pp. 2980-8.

110. Huang Z, Huang L, Gong Y, Huang C, Wang X. Mask scoring R-CNN. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019 Jun 15-20; Long Beach, CA, USA. IEEE; 2019. pp. 6402-11.

111. Chen K, Pang J, Wang J, et al. Hybrid task cascade for instance segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019 Jun 15-20; Long Beach, CA, USA. IEEE; 2019. pp. 4969-78.

112. Ding H, Qiao S, Yuille A, Shen W. Deeply shape-guided cascade for instance segmentation. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2021 Jun 20-25; Nashville, TN, USA. IEEE; 2021. pp. 8274-84.

113. Bolya D, Zhou C, Xiao F, Lee YJ. Yolact: real-time instance segmentation. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV); 2019 Oct 27 - Nov 2; Seoul, Korea (South). IEEE; 2019. pp. 9156-65.

114. Wang X, Kong T, Shen C, Jiang Y, Li L. SOLO: segmenting objects by locations. In: Vedaldi A, Bischof H, Brox T, Frahm J, editors. Computer Vision - ECCV 2020. Cham: Springer; 2020. pp. 649-65.

115. Kirillov A, Wu Y, He K, Girshick R. Pointrend: image segmentation as rendering. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020 Jun 13-19; Seattle, WA, USA. IEEE; 2020. pp. 9796-805.

116. Tian Z, Shen C, Chen H. Conditional convolutions for instance segmentation. In: Vedaldi A, Bischof H, Brox T, Frahm J, editors. Computer Vision - ECCV 2020. Cham: Springer; 2020. pp. 282-98.

117. Liu K, Zhao Z, Shi P, Li F, Song H. Real-time surgical tool detection in computer-aided surgery based on enhanced feature-fusion convolutional neural network. J Comput Des Eng 2022;9:1123-34.

118. Bamba Y, Ogawa S, Itabashi M, et al. Object and anatomical feature recognition in surgical video images based on a convolutional neural network. Int J Comput Assist Radiol Surg 2021;16:2045-54.

119. Cerón JCÁ, Ruiz GO, Chang L, Ali S. Real-time instance segmentation of surgical instruments using attention and multi-scale feature fusion. Med Image Anal 2022;81:102569.

120. Wang A, Islam M, Xu M, Ren H. Rethinking surgical instrument segmentation: a background image can be all you need. In: Wang L, Dou Q, Fletcher PT, Speidel S, Li S, editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2022. Cham: Springer; 2022. pp. 355-64.

121. Zhang Z, Rosa B, Nageotte F. Surgical tool segmentation using generative adversarial networks with unpaired training data. IEEE Robot Autom Lett 2021;6:6266-73.

122. Yang L, Gu Y, Bian G, Liu Y. An attention-guided network for surgical instrument segmentation from endoscopic images. Comput Biol Med 2022;151:106216.

123. Ding H, Wu JY, Li Z, Unberath M. Rethinking causality-driven robot tool segmentation with temporal constraints. Int J Comput Assist Radiol Surg 2023;18:1009-16.

124. Colleoni E, Edwards P, Stoyanov D. Synthetic and real inputs for tool segmentation in robotic surgery. In: Martel AL, et al., editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2020. Cham: Springer; 2020. pp. 700-10.

125. Lee K, Choi MK, Jung H. DavinciGAN: unpaired surgical instrument translation for data augmentation. Available from: http://proceedings.mlr.press/v102/lee19a.html. [Last accessed on 3 Jul 2024]

126. Zheng S, Lu J, Zhao H, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2021 Jun 20-25; Nashville, TN, USA. IEEE; 2021. pp. 6877-86.

127. Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P. Segformer: simple and efficient design for semantic segmentation with transformers. In: Advances in neural information processing systems 34 (NeurIPS 2021). Available from: https://proceedings.neurips.cc/paper/2021/hash/64f1f27bf1b4ec22924fd0acb550c235-Abstract.html. [Last accessed on 3 Jul 2024]

128. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S. End-to-end object detection with transformers. In: Vedaldi A, Bischof H, Brox T, Frahm J, editors. Computer Vision - ECCV 2020. Cham: Springer; 2020. pp. 213-29.

129. Zhu X, Su W, Lu L, Li B, Wang X, Dai J. Deformable detr: deformable transformers for end-to-end object detection. arXiv. [Preprint.] Mar 18, 2021 [accessed 2024 Jul 3]. Available from: https://arxiv.org/abs/2010.04159.

130. Meng D, Chen X, Fan Z, et al. Conditional DETR for fast training convergence. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV); 2021 Oct 10-17; Montreal, QC, Canada. IEEE; 2021. pp. 3631-40.

131. Li Y, Mao H, Girshick R, He K. Exploring plain vision transformer backbones for object detection. In: Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T, editors. Computer Vision - ECCV 2022. Cham: Springer; 2022. pp. 280-96.

132. Zhang H, Li F, Liu S, et al. DINO: DETR with improved denoising anchor boxes for end-to-end object detection. arXiv. [Preprint.] Jul 11, 2022 [accessed 2024 Jul 3]. Available from: https://arxiv.org/abs/2203.03605.

133. Cheng B, Misra I, Schwing AG, Kirillov A, Girdhar R. Masked-attention mask transformer for universal image segmentation. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2022 Jun 18-24; New Orleans, LA, USA. IEEE; 2022. pp. 1280-9.

134. Zou X, Dou ZY, Yang J, et al. Generalized decoding for pixel, image, and language. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023 Jun 17-24; Vancouver, BC, Canada. IEEE; 2023. pp. 15116-27.

135. Radford A, Kim JW, Hallacy C, et al. Learning transferable visual models from natural language supervision. Available from: http://proceedings.mlr.press/v139/radford21a.html. [Last accessed on 3 Jul 2024]

136. Li LH, Zhang P, Zhang H, et al. Grounded language-image pre-training. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2022 Jun 18-24; New Orleans, LA, USA. IEEE; 2022. pp. 10955-65.

137. Zhong Y, Yang J, Zhang P, et al. RegionCLIP: region-based language-image pretraining. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2022 Jun 18-24; New Orleans, LA, USA. IEEE; 2022. 16772-82.

138. Luo H, Bao J, Wu Y, He X, Li T. SegCLIP: patch aggregation with learnable centers for open-vocabulary semantic segmentation. Available from: https://proceedings.mlr.press/v202/luo23a.html. [Last accessed on 3 Jul 2024]

139. He Z, Unberath M, Ke J, Shen Y. TransNuSeg: a lightweight multi-task transformer for nuclei segmentation. In: Greenspan H, et al., editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2023. Cham: Springer; 2023. pp. 206-15.

140. Shen Y, Guo P, Wu J, et al. MoViT: Memorizing vision transformers for medical image analysis. In: Cao X, Xu X, Rekik I, Cui Z, Ouyang X, editors. Machine Learning in Medical Imaging. Cham: Springer; 2024. pp. 205-13.

141. Oguine K, Soberanis-Muku R, Drenkow N, Unberath M. From generalization to precision: exploring sam for tool segmentation in surgical environments. Med Imag Proc 2024 Imag Process 2024;12926:7-12.

142. Peng Z, Xu Z, Zeng Z, Yang X, Shen W. SAM-PARSER: fine-tuning SAM efficiently by parameter space reconstruction. arXiv. [Preprint.] Dec 18, 2023 [accessed 2024 Jul 3]. Available from: https://arxiv.org/abs/2308.14604.

143. Li X, Zhang Y, Zhao L. Multi-prompt fine-tuning of foundation models for enhanced medical image segmentation. arXiv. [Preprint.] Oct 3, 2023 [accessed 2024 Jul 3]. Available from: https://arxiv.org/abs/2310.02381.

144. Tyagi AK, Mishra V, Prathosh AP, Mausam. Guided prompting in sam for weakly supervised cell segmentation in histopathological images. arXiv. [Preprint.] Nov 29, 2023 [accessed 2024 Jul 3]. Available from: https://arxiv.org/abs/2311.17960.

145. Paranjape JN, Nair NG, Sikder S, Vedula SS, Patel VM. AdaptiveSAM: towards efficient tuning of SAM for surgical scene segmentation. arXiv. [Preprint.] Aug 7, 2023 [accessed 2024 Jul 3]. Available from: https://arxiv.org/abs/2308.03726.

146. Yue W, Zhang J, Hu K, Xia Y, Luo J, Wang Z. SurgicalSAM: efficient class promptable surgical instrument segmentation. arXiv. [Preprint.] Dec 21, 2023 [accessed 2024 Jul 3]. Available from: https://arxiv.org/abs/2308.08746.

147. Wang A, Islam M, Xu M, Zhang Y, Ren H. SAM meets robotic surgery: an empirical study in robustness perspective. arXiv. [Preprint.] Apr 28, 2023 [accessed 2024 Jul 3]. Available from: https://arxiv.org/abs/2304.14674.

148. He Y, Yu H, Liu X, Yang Z, Sun W, Mian A. Deep learning based 3D segmentation: a survey. arXiv. [Preprint.] Jul 26, 2023 [accessed 2024 Jul 3]. Available from: https://arxiv.org/abs/2103.05423.

149. Qian R, Lai X, Li X. 3D object detection for autonomous driving: a survey. Pattern Recognit 2022;130:108796.

150. Qi CR, Su H, Mo K, Guibas LJ. Pointnet: deep learning on point sets for 3D classification and segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017 Jul 21-26; Honolulu, HI, USA. IEEE; 2017. pp. 77-85.

151. Wang W, Neumann U. Depth-aware CNN for RGB-D segmentation. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y, editors. Computer Vision - ECCV 2018. Cham: Springer; 2018. pp. 144-61.

152. Zhang Y, Lu J, Zhou J. Objects are different: flexible monocular 3d object detection. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2021 Jun 20-25; Nashville, TN, USA. IEEE; 2021. pp. 3288-97.

153. Wang Y, Guizilini VC, Zhang T, Wang Y, Zhao H, Solomon J. DETR3D: 3D object detection from multi-view images via 3D-to-2D queries. Available from: https://proceedings.mlr.press/v164/wang22b.html. [Last accessed on 3 Jul 2024]

154. Maninis KK, Caelles S, Chen Y, et al. Video object segmentation without temporal information. IEEE Trans Pattern Anal Mach Intell 2019;41:1515-30.

155. Caelles S, Maninis KK, Pont-Tuset J, Leal-Taixé L, Cremers D, Van Gool L. One-shot video object segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017 Jul 21-26; Honolulu, HI, USA. IEEE; 2017. pp. 221-30.

156. Hu YT, Huang JB, Schwing A. MaskRNN: instance level video object segmentation. Available from: https://proceedings.neurips.cc/paper/2017/hash/6c9882bbac1c7093bd25041881277658-Abstract.html. [Last accessed on 3 Jul 2024]

157. Ventura C, Bellver M, Girbau A, Salvador A, Marques F, Giro-i-Nieto X. RVOS: end-to-end recurrent network for video object segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); Jun 15-20; Long Beach, CA, USA. IEEE; 2019. pp. 5272-81.

158. Oh SW, Lee JY, Xu N, Kim SJ. Video object segmentation using space-time memory networks. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV); 2019 Oct 27 - Nov 2; Seoul, Korea (South). IEEE; 2019. pp. 9225-34.

159. Cheng HK, Tai YW, Tang CK. Rethinking space-time networks with improved memory coverage for efficient video object segmentation. Available from: https://proceedings.neurips.cc/paper/2021/hash/61b4a64be663682e8cb037d9719ad8cd-Abstract.html. [Last accessed on 3 Jul 2024]

160. Cheng HK, Schwing AG. XMem: long-term video object segmentation with an atkinson-shiffrin memory model. In: Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T, editors. Computer Vision - ECCV 2022. Cham: Springer; 2022. pp. 640-58.

161. Duke B, Ahmed A, Wolf C, Aarabi P, Taylor GW. SSTVOS: sparse spatiotemporal transformers for video object segmentation. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2021 Jun 20-25; Nashville, TN, USA. IEEE; 2021. pp. 5908-17.

162. Yang Z, Wei Y, Yang Y. Associating objects with transformers for video object segmentation. Available from: https://proceedings.neurips.cc/paper/2021/hash/147702db07145348245dc5a2f2fe5683-Abstract.html. [Last accessed on 3 Jul 2024]

163. Cheng HK, Oh SW, Price B, Lee JY, Schwing A. Putting the object back into video object segmentation. arXiv. [Preprint.] Apr 11, 2024 [accessed 2024 Jul 3]. Available from: https://arxiv.org/abs/2310.12982.

164. Gong T, Chen K, Wang X, et al. Temporal ROI align for video object recognition. AAAI 2021;35:1442-50.

165. Wu H, Chen Y, Wang N, Zhang ZX. Sequence level semantics aggregation for video object detection. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV); 2019 Oct 27 - Nov 2. IEEE; 2019. pp. 9216-24.

166. Zhu X, Wang Y, Dai J, Yuan L, Wei Y. Flow-guided feature aggregation for video object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV); 2017 Oct 22-29; Venice, Italy. IEEE; 2017. pp. 408-17.

167. Zhu X, Xiong Y, Dai J, Yuan L, Wei Y. Deep feature flow for video recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2017. pp. 4141-50.

168. Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J. SiamRPN++: evolution of siamese visual tracking with very deep networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019 Jun 15-20; Long Beach, CA, USA. IEEE; 2019. pp. 4277-86.

169. Yan B, Peng H, Fu J, Wang D, Lu H. Learning spatio-temporal transformer for visual tracking. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV); 2021 Oct 10-17; Montreal, QC, Canada. IEEE; 2021. pp. 10428-37.

170. Cui Y, Jiang C, Wang L, Wu G. Mixformer: end-to-end tracking with iterative mixed attention. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2022 JUN 18-24; New Orleans, LA, USA. IEEE; 2022. pp. 13598-608.

171. Bergmann P, Meinhardt T, Leal-Taixe L. Tracking without bells and whistles. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV); 2019 Oct 27 - Nov 2; Seoul, Korea (South). IEEE; 2019. pp. 941-51.

172. Pang J, Qiu L, Li X, et al. Quasi-dense similarity learning for multiple object tracking. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2021 Jun 20-25; Nashville, TN, USA. IEEE; 2021. pp. 164-73.

173. Zhang Y, Sun P, Jiang Y, et al. ByteTrack: Multi-object tracking by associating every detection box. In: Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T, editors. Computer Vision - ECCV 2022. Cham: Springer; 2022. pp. 1-21.

174. Luo W, Xing J, Milan A, Zhang X, Liu W, Kim T. Multiple object tracking: a literature review. Artif Intell 2021;293:103448.

175. Wang Y, Sun Q, Liu Z, Gu L. Visual detection and tracking algorithms for minimally invasive surgical instruments: a comprehensive review of the state-of-the-art. Robot Auton Syst 2022;149:103945.

176. Dakua SP, Abinahed J, Zakaria A, et al. Moving object tracking in clinical scenarios: application to cardiac surgery and cerebral aneurysm clipping. Int J Comput Assist Radiol Surg 2019;14:2165-76.

177. Liu B, Sun M, Liu Q, Kassam A, Li CC, Sclabassi R. Automatic detection of region of interest based on object tracking in neurosurgical video. Conf Proc IEEE Eng Med Biol Soc 2005;2005:6273-6.

178. Du X, Allan M, Bodenstedt S, et al. Patch-based adaptive weighting with segmentation and scale (PAWSS) for visual tracking in surgical video. Med Image Anal 2019;57:120-35.

179. Stenmark M, Omerbašić E, Magnusson M, Andersson V, Abrahamsson M, Tran PK. Vision-based tracking of surgical motion during live open-heart surgery. J Surg Res 2022;271:106-16.

180. Cheng T, Li W, Ng WY, et al. Deep learning assisted robotic magnetic anchored and guided endoscope for real-time instrument tracking. IEEE Robot Autom Lett 2021;6:3979-86.

181. Zhao Z, Voros S, Chen Z, Cheng X. Surgical tool tracking based on two CNNs: from coarse to fine. J Eng 2019;2019:467-72.

182. Robu M, Kadkhodamohammadi A, Luengo I, Stoyanov D. Towards real-time multiple surgical tool tracking. Comput Methods Biomecha Biomed Eng Imaging Vis 2021;9:279-85.

183. Lee D, Yu HW, Kwon H, Kong HJ, Lee KE, Kim HC. Evaluation of surgical skills during robotic surgery by deep learning-based multiple surgical instrument tracking in training and actual operations. J Clin Med 2020;9:1964.

184. García-Peraza-Herrera LC, Li W, Gruijthuijsen C, et al. Real-time segmentation of non-rigid surgical tools based on deep learning and tracking. In: Peters T, et al., editors. Computer-Assisted and Robotic Endoscopy. Cham: Springer; 2017. pp. 84-95.

185. Jo K, Choi Y, Choi J, Chung JW. Robust real-time detection of laparoscopic instruments in robot surgery using convolutional neural networks with motion vector prediction. Appl Sci 2019;9:2865.

186. Zhao Z, Chen Z, Voros S, Cheng X. Real-time tracking of surgical instruments based on spatio-temporal context and deep learning. Comput Assist Surg 2019;24:20-9.

187. Alshirbaji TA, Jalal NA, Möller K. A convolutional neural network with a two-stage LSTM model for tool presence detection in laparoscopic videos. Curr Dir Biomed Eng 2020;6:20200002.

188. Bouguet JY, Perona P. 3D photography using shadows in dual-space geometry. Int J Comput Vis 1999;35:129-49.

189. Iddan GJ, Yahav G. Three-dimensional imaging in the studio and elsewhere. Proc SPIE 2001;4298:48-55.

190. Nayar SK, Krishnan G, Grossberg MD, Raskar R. Fast separation of direct and global components of a scene using high frequency illumination. ACM Trans Graph 2006;25:935-44.

191. Torralba A, Oliva A. Depth estimation from image structure. IEEE Trans Pattern Anal Mach Intell 2002;24:1226-38.

192. Marr D, Poggio T. Cooperative computation of stereo disparity: a cooperative algorithm is derived for extracting disparity information from stereo image pairs. Science 1976;194:283-7.

193. Szeliski R. Computer vision: algorithms and applications. Springer. 2022.

194. Hannah MJ. Computer matching of areas in stereo images. Stanford University. 1974. Available from: https://www.semanticscholar.org/paper/Computer-matching-of-areas-in-stereo-images.-Hannah/02a0829a658e7dbfdf49e8112b38f8911a12eb76. [Last accessed on 3 Jul 2024].

195. Stoyanov D, Darzi A, Yang GZ. A practical approach towards accurate dense 3D depth recovery for robotic laparoscopic surgery. Comput Aided Surg 2005;10:199-208.

196. Arnold RD. Automated stereo perception. PhD book, Stanford University (1983). Available from: https://searchworks.stanford.edu/view/1052936. [Last accessed on 3 Jul 2024]

197. Okutomi M, Kanade T. A locally adaptive window for signal matching. Int J Comput Vision 1992;7:143-62.

198. Szeliski R, Coughlan J. Spline-based image registration. Int J Comput Vis 1997;22:199-218.

199. Lo B, Scarzanella MV, Stoyanov D, Yang G. Belief propagation for depth cue fusion in minimally invasive surgery. In: Metaxas D, Axel L, Fichtinger G, Székely G, editors. Medical Image Computing and Computer-Assisted Intervention - MICCAI 2008. Berlin: Springer; 2008. pp. 104-12.

200. Sinha RY, Raje SR, Rao GA. Three-dimensional laparoscopy: principles and practice. J Minim Access Surg 2017;13:165-9.

201. Mueller-Richter UD, Limberger A, Weber P, Ruprecht KW, Spitzer W, Schilling M. Possibilities and limitations of current stereo-endoscopy. Surg Endosc 2004;18:942-7.

202. Bogdanova R, Boulanger P, Zheng B. Depth perception of surgeons in minimally invasive surgery. Surg Innov 2016;23:515-24.

203. Sinha R, Sundaram M, Raje S, Rao G, Sinha M, Sinha R. 3D laparoscopy: technique and initial experience in 451 cases. Gynecol Surg 2013;10:123-8.

204. Liu X, Sinha A, Ishii M, et al. Dense depth estimation in monocular endoscopy with self-supervised learning methods. IEEE Trans Med Imaging 2020;39:1438-47.

205. Li L, Li X, Yang S, Ding S, Jolfaei A, Zheng X. Unsupervised-learning-based continuous depth and motion estimation with monocular endoscopy for virtual reality minimally invasive surgery. IEEE Trans Ind Inf 2021;17:3920-8.

206. Liu F, Shen C, Lin G. Deep convolutional neural fields for depth estimation from a single image. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2015 Jun 7-12; Boston, MA, USA. IEEE; 2014. pp. 5162-70.

207. Visentini-Scarzanella M, Sugiura T, Kaneko T, Koto S. Deep monocular 3D reconstruction for assisted navigation in bronchoscopy. Int J Comput Assist Radiol Surg 2017;12:1089-99.

208. Oda M, Itoh H, Tanaka K, et al. Depth estimation from single-shot monocular endoscope image using image domain adaptation and edge-aware depth estimation. Comput Methods Biomech Biomed Eng Imaging Vis 2022;10:266-73.

209. Zhan H, Garg R, Weerasekera CS, Li K, Agarwal H, Reid I. Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2018 Jun 18-23; Salt Lake City, UT, USA. IEEE; 2018. pp. 340-9.

210. Liu F, Jonmohamadi Y, Maicas G, Pandey AK, Carneiro G. Self-supervised depth estimation to regularise semantic segmentation in knee arthroscopy. In: Martel AL, et al., editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2020. Cham: Springer; 2020. pp. 594-603.

211. Mahmood F, Chen R, Durr NJ. Unsupervised reverse domain adaptation for synthetic medical images via adversarial training. IEEE Trans Med Imaging 2018;37:2572-81.

212. Guo R, Ayinde B, Sun H, Muralidharan H, Oguchi K. Monocular depth estimation using synthetic images with shadow removal. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC); 2019 Oct 27-30; Auckland, New Zealand. IEEE; 2019. pp. 1432-9.

213. Chen RJ, Bobrow TL, Athe T, Mahmood F, Durr NJ. SLAM endoscopy enhanced by adversarial depth prediction. arXiv. [Preprint.] Jun 29, 2019 [accessed 2024 Jul 3]. Available from: https://arxiv.org/abs/1907.00283.

214. Schreiber AM, Hong M, Rozenblit JW. Monocular depth estimation using synthetic data for an augmented reality training system in laparoscopic surgery. In: 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC); 2021 Oct 17-20; Melbourne, Australia. IEEE; 2021. pp. 2121-6.

215. Tong HS, Ng YL, Liu Z, et al. Real-to-virtual domain transfer-based depth estimation for real-time 3D annotation in transnasal surgery: a study of annotation accuracy and stability. Int J Comput Assist Radiol Surg 2021;16:731-9.

216. Wong A, Soatto S. Bilateral cyclic constraint and adaptive regularization for unsupervised monocular depth prediction. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019 Jun 15-20; Long Beach, CA, USA. IEEE; 2019. pp. 5637-46.

217. Widya AR, Monno Y, Okutomi M, Suzuki S, Gotoda T, Miki K. Learning-based depth and pose estimation for monocular endoscope with loss generalization. Annu Int Conf IEEE Eng Med Biol Soc 2021;2021:3547-52.

218. Shao S, Pei Z, Chen W, et al. Self-Supervised monocular depth and ego-Motion estimation in endoscopy: appearance flow to the rescue. Med Image Anal 2022;77:102338.

219. Hwang SJ, Park SJ, Kim GM, Baek JH. Unsupervised monocular depth estimation for colonoscope system using feedback network. Sensors 2021;21:2691.

220. Li W, Hayashi Y, Oda M, Kitasaka T, Misawa K, Mori K. Geometric constraints for self-supervised monocular depth estimation on laparoscopic images with dual-task consistency. In: Wang L, Dou Q, Fletcher PT, Speidel S, Li S, editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2022. Cham: Springer; 2022. pp. 467-77.

221. Masuda T, Sagawa R, Furukawa R, Kawasaki H. Scale-preserving shape reconstruction from monocular endoscope image sequences by supervised depth learning. Healthc Technol Lett 2024;11:76-84.

222. Tukra S, Giannarou S. Randomly connected neural networks for self-supervised monocular depth estimation. Comput Methods Biomech Biomed Eng Imaging Vis 2022;10:390-9.

223. Zhao S, Wang C, Wang Q, Liu Y, Zhou SK. 3D endoscopic depth estimation using 3d surface-aware constraints. arXiv. [Preprint.] Mar 4, 2022 [accessed 2024 Jul 3]. Available from: https://arxiv.org/abs/2203.02131.

224. Han J, Jiang Z, Feng G. Monocular depth estimation based on chained residual pooling and gradient weighted loss. In: 2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE); 2023 Jan 6-8; Guangzhou, China. IEEE; 2023. pp. 278-82.

225. Yang Y, Shao S, Yang T, et al. A geometry-aware deep network for depth estimation in monocular endoscopy. Eng Appl Artif Intell 2023;122:105989.

226. Zhang G, Gao X, Meng H, Pang Y, Nie X. A self-supervised network-based smoke removal and depth estimation for monocular endoscopic videos. IEEE Trans Vis Comput Graph 2024;30:6547-59.

227. Yang L, Kang B, Huang Z, Xu X, Feng J, Zhao H. Depth anything: unleashing the power of large-scale unlabeled data. arXiv. [Preprint.] Apr 7, 2024 [accessed 2024 Jul 3]. Available from: https://arxiv.org/abs/2401.10891.

228. Han JJ, Acar A, Henry C, Wu JY. Depth anything in medical images: a comparative study. arXiv. [Preprint.] Jan 29, 2024 [accessed 2024 Jul 3]. Available from: https://arxiv.org/abs/2401.16600.

229. Chang JR, Chen YS. Pyramid stereo matching network. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2018 Jun 18-23; Salt Lake City, UT, USA. IEEE; 2018. pp. 5410-8.

230. Luo W, Schwing AG, Urtasun R. Efficient deep learning for stereo matching. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016 Jun 27-30; Las Vegas, NV, USA. IEEE; 2016. pp. 5695-703.

231. Zampokas G, Peleka G, Tsiolis K, Topalidou-Kyniazopoulou A, Mariolis I, Tzovaras D. Real-time stereo reconstruction of intraoperative scene and registration to preoperative 3D models for augmenting surgeons’ view during RAMIS. Med Phys 2022;49:6517-26.

232. Probst T, Maninis K, Chhatkuli A, Ourak M, Poorten EV, Van Gool L. Automatic tool landmark detection for stereo vision in robot-assisted retinal surgery. IEEE Robot Autom Lett 2018;3:612-9.

233. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale. arXiv. [Preprint.] Jun 3, 2021 [accessed 2024 Jul 3]. Available from: https://arxiv.org/abs/2010.11929.

234. Tao R, Huang B, Zou X, Zheng G. SVT-SDE: spatiotemporal vision transformers-based self-supervised depth estimation in stereoscopic surgical videos. IEEE Trans Med Robot Bionics 2023;5:42-53.

235. Li Z, Liu X, Drenkow N, et al. Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV); 2021 Oct 10-17; Montreal, QC, Canada. IEEE; 2021. pp. 6177-86.

236. Long Y, Li Z, Yee CH, et al. E-DSSR: efficient dynamic surgical scene reconstruction with transformer-based stereoscopic depth perception. In: de Bruijne M, et al., editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2021. Cham: Springer; 2021. pp. 415-25.

237. Guo W, Li Z, Yang Y, et al. Context-enhanced stereo transformer. In: Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T, editors. Computer Vision - ECCV 2022. Cham: Springer; 2022. pp. 263-79.

238. Hu X, Baena FRY. Automatic bone surface restoration for markerless computer-assisted orthopaedic surgery. Chin J Mech Eng 2022;35:18.

239. Wang W, Zhou H, Yan Y, et al. An automatic extraction method on medical feature points based on PointNet++ for robot-assisted knee arthroplasty. Int J Med Robot 2023;19:e2464.

240. Baum ZMC, Hu Y, Barratt DC. Multimodality biomedical image registration using free point transformer networks. In: Hu Y, et al., editors. Medical Ultrasound, and Preterm, Perinatal and Paediatric Image Analysis. Cham: Springer; 2020. pp. 116-25.

241. Baum ZMC, Hu Y, Barratt DC. Real-time multimodal image registration with partial intraoperative point-set data. Med Image Anal 2021;74:102231.

242. Widya AR, Monno Y, Imahori K, et al. 3D reconstruction of whole stomach from endoscope video using structure-from-motion. Annu Int Conf IEEE Eng Med Biol Soc 2019;2019:3900-4.

243. Lin B, Sun Y, Qian X, Goldgof D, Gitlin R, You Y. Video-based 3D reconstruction, laparoscope localization and deformation recovery for abdominal minimally invasive surgery: a survey. Int J Med Robot 2016;12:158-78.

244. Song J, Wang J, Zhao L, Huang S, Dissanayake G. Dynamic reconstruction of deformable soft-tissue with stereo scope in minimal invasive surgery. IEEE Robot Autom Lett 2018;3:155-62.

245. Zhou H, Jagadeesan J. Real-time dense reconstruction of tissue surface from stereo optical video. IEEE Trans Med Imaging 2020;39:400-12.

246. Zhou H, Jayender J. EMDQ-SLAM: real-time high-resolution reconstruction of soft tissue surface from stereo laparoscopy videos. Med Image Comput Comput Assist Interv 2021;12904:331-40.

247. Wei G, Feng G, Li H, Chen T, Shi W, Jiang Z. A novel slam method for laparoscopic scene reconstruction with feature patch tracking. In: 2020 International Conference on Virtual Reality and Visualization (ICVRV); 2020 Nov 13-14; Recife, Brazil. IEEE; 2020. pp. 287-91.

248. Wang Y, Long Y, Fan SH, Dou Q. Neural rendering for stereo 3D reconstruction of deformable tissues in robotic surgery. In: Wang L, Dou Q, Fletcher PT, Speidel S, Li S, editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2022. Cham: Springer; 2022. pp. 431-41.

249. Zha R, Cheng X, Li H, Harandi M, Ge Z. EndoSurf: neural surface reconstruction of deformable tissues with stereo endoscope videos. In: Greenspan H, et al., editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2023. Cham: Springer; 2023. pp. 13-23.

250. Newcombe RA, Fox D, Seitz SM. Dynamicfusion: reconstruction and tracking of non-rigid scenes in real-time. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2015 Jun 7-12; Boston, MA, USA. IEEE; 2015. pp. 343-52.

251. Li Y, Richter F, Lu J, et al. SuPer: a surgical perception framework for endoscopic tissue manipulation with surgical robotics. IEEE Robot Autom Lett 2020;5:2294-301.

252. Mangulabnan JE, Soberanis-Mukul RD, Teufel T, et al. An endoscopic chisel: intraoperative imaging carves 3D anatomical models. Int J Comput Assist Radiol Surg 2024;19:1359-66.

253. Nguyen KT, Tozzi F, Rashidian N, Willaert W, Vankerschaver J, De Neve W. Towards abdominal 3-D scene rendering from laparoscopy surgical videos using NeRFs. In: Cao X, Xu X, Rekik I, Cui Z, Ouyang X, editors. Machine Learning in Medical Imaging. Cham: Springer; 2024. pp. 83-93.

254. Hein J, Seibold M, Bogo F, et al. Towards markerless surgical tool and hand pose estimation. Int J Comput Assist Radiol Surg 2021;16:799-808.

255. Félix I, Raposo C, Antunes M, Rodrigues P, Barreto JP. Towards markerless computer-aided surgery combining deep segmentation and geometric pose estimation: application in total knee arthroplasty. Comput Methods Biomech Biomed Eng Imaging Vis 2021;9:271-8.

256. Li Z, Shu H, Liang R, et al. TAToo: vision-based joint tracking of anatomy and tool for skull-base surgery. Int J Comput Assist Radiol Surg 2023;18:1303-10.

257. Murphy-Chutorian E, Trivedi MM. Head pose estimation in computer vision: a survey. IEEE Trans Pattern Anal Mach Intell 2009;31:607-26.

258. Toshev A, Szegedy C. Deeppose: human pose estimation via deep neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition; 2014 Jun 23-28; Columbus, OH, USA. IEEE; 2014. pp. 1653-60.

259. Allan M, Chang P, Ourselin S, et al. Image based surgical instrument pose estimation with multi-class labelling and optical flow. In: Navab N, Hornegger J, Wells WM, Frangi A, editors. Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2015. Cham: Springer; 2015. pp. 331-8.

260. Peng S, Liu Y, Huang Q, Zhou X, Bao H. Pvnet: pixel-wise voting network for 6DoF pose estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019 Jun 15-20; Long Beach, CA, USA. IEEE; 2019. pp. 4556-65.

261. Do TT, Cai M, Pham T, Reid I. Deep-6dpose: Recovering 6d object pose from a single rgb image. arXiv. [Preprint.] Feb 28, 2018 [accessed 2024 Jul 3]. Available from: https://arxiv.org/abs/1802.10367.

262. He Z, Feng W, Zhao X, Lv Y. 6D pose estimation of objects: recent technologies and challenges. Appl Sci 2021;11:228.

263. Marullo G, Tanzi L, Piazzolla P, Vezzetti E. 6D object position estimation from 2D images: a literature review. Multimed Tools Appl 2023;82:24605-43.

264. Hasson Y, Tekin B, Bogo F, Laptev I, Pollefeys M, Schmid C. Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020 Jun 13-19; Seattle, WA, USA. IEEE; 2020. pp. 568-77.

265. Kadkhodamohammadi A, Gangi A, de Mathelin M, Padoy N. Articulated clinician detection using 3D pictorial structures on RGB-D data. Med Image Anal 2017;35:215-24.

266. Padoy N. Machine and deep learning for workflow recognition during surgery. Minim Invasive Ther Allied Technol 2019;28:82-90.

267. Kadkhodamohammadi A, Gangi A, de Mathelin M, Padoy N. A multi-view rgb-d approach for human pose estimation in operating rooms. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV); 2017 Mar 24-31; Santa Rosa, CA, USA. IEEE; 2017. pp. 363-72.

268. Long Y, Wei W, Huang T, Wang Y, Dou Q. Human-in-the-loop embodied intelligence with interactive simulation environment for surgical robot learning. IEEE Robot Autom Lett 2023;8:4441-8.

269. Killeen BD, Cho SM, Armand M, Taylor RH, Unberath M. In silico simulation: a key enabling technology for next-generation intelligent surgical systems. Prog Biomed Eng 2023;5:032001.

270. Munawar A, Wang Y, Gondokaryono R, Fischer GS. A real-time dynamic simulator and an associated front-end representation format for simulating complex robots and environments. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2019 Nov 3-8; Macau, China. IEEE; 2019. pp. 1875-82.

271. Munawar A, Li Z, Kunjam P, et al. Virtual reality for synergistic surgical training and data generation. Comput Methods Biomech Biomed Eng Imaging Vis 2022;10:366-74.

272. Munawar A, Li Z, Nagururu N, et al. Fully immersive virtual reality for skull-base surgery: surgical training and beyond. Int J Comput Assist Radiol Surg 2024;19:51-9.

273. Ishida H, Barragan JA, Munawar A, et al. Improving surgical situational awareness with signed distance field: a pilot study in virtual reality. In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2023 Oct 1-5; Detroit, MI, USA. IEEE; 2023. pp. 8474-9.

274. Ishida H, Sahu M, Munawar A, et al. Haptic-assisted collaborative robot framework for improved situational awareness in skull base surgery. arXiv. [Preprint.] Jan 22, 2024 [accessed 2024 Jul 3]. Available from: https://arxiv.org/abs/2401.11709.

275. Sahu M, Ishida H, Connolly L, et al. Integrating 3D slicer with a dynamic simulator for situational aware robotic interventions. arXiv. [Preprint.] Jan 22, 2024 [accessed 2024 Jul 3]. Available from: https://arxiv.org/abs/2401.11715.

276. Su YH, Munawar A, Deguet A, et al. Collaborative robotics toolkit (crtk): open software framework for surgical robotics research. In: 2020 Fourth IEEE International Conference on Robotic Computing (IRC); 2020 Nov 9-11; Taichung, Taiwan. IEEE; 2020. pp. 48-55.

277. Fedorov A, Beichel R, Kalpathy-Cramer J, et al. 3D slicer as an image computing platform for the quantitative imaging network. Magn Reson Imaging 2012;30:1323-41.

278. Shi Y, Deng X, Tong Y, et al. Synergistic digital twin and holographic augmented-reality-guided percutaneous puncture of respiratory liver tumor. IEEE Trans Human-Mach Syst 2022;52:1364-74.

279. Poletti G, Antonini L, Mandelli L, et al. Towards a digital twin of coronary stenting: a suitable and validated image-based approach for mimicking patient-specific coronary arteries. Electronics 2022;11:502.

280. Aubert K, Germaneau A, Rochette M, et al. Development of digital twins to optimize trauma surgery and postoperative management. A case study focusing on tibial plateau fracture. Front Bioeng Biotechnol 2021;9:722275.

281. Hernigou P, Olejnik R, Safar A, Martinov S, Hernigou J, Ferre B. Digital twins, artificial intelligence, and machine learning technology to identify a real personalized motion axis of the tibiotalar joint for robotics in total ankle arthroplasty. Int Orthop 2021;45:2209-17.

282. Shinozuka K, Turuda S, Fujinaga A, et al. Artificial intelligence software available for medical devices: surgical phase recognition in laparoscopic cholecystectomy. Surg Endosc 2022;36:7444-52.

283. Funke I, Mees ST, Weitz J, Speidel S. Video-based surgical skill assessment using 3D convolutional neural networks. Int J Comput Assist Radiol Surg 2019;14:1217-25.

284. Hashimoto DA, Rosman G, Witkowski ER, et al. Computer vision analysis of intraoperative video: automated recognition of operative steps in laparoscopic sleeve gastrectomy. Ann Surg 2019;270:414-21.

285. Killeen BD, Zhang H, Wang LJ, et al. Stand in surgeon’s shoes: virtual reality cross-training to enhance teamwork in surgery. Int J Comput Assist Radiol Surg 2024;19:1213-22.

286. Vercauteren T, Unberath M, Padoy N, Navab N. CAI4CAI: the rise of contextual artificial intelligence in computer assisted interventions. Proc IEEE Inst Electr Electron Eng 2020;108:198-214.

287. Li Z, Drenkow N, Ding H, et al. On the sins of image synthesis loss for self-supervised depth estimation. arXiv. [Preprint.] Oct 10, 2021 [accessed 2024 Jul 3]. Available from: https://arxiv.org/abs/2109.06163.

288. Gonzales A, Guruswamy G, Smith SR. Synthetic data in health care: a narrative review. PLOS Digit Health 2023;2:e0000082.

289. Sahu M, Mukhopadhyay A, Zachow S. Simulation-to-real domain adaptation with teacher-student learning for endoscopic instrument segmentation. Int J Comput Assist Radiol Surg 2021;16:849-59.