REFERENCES

1. Jordan, M. I.; Mitchell, T. M. Machine learning: trends, perspectives, and prospects. Science 2015, 349, 255-60.

2. Butler, K. T.; Davies, D. W.; Cartwright, H.; Isayev, O.; Walsh, A. Machine learning for molecular and materials science. Nature 2018, 559, 547-55.

3. Keith, J. A.; Vassilev-Galindo, V.; Cheng, B.; et al. Combining machine learning and computational chemistry for predictive insights into chemical systems. Chem. Rev. 2021, 121, 9816-72.

4. Shi, X.; Zhang, G.; Lu, Y.; Pang, H. Applications of machine learning in electrochemistry. Renewables 2023, 1, 668-93.

5. Jiang, Y.; Yang, Z.; Guo, J.; et al. Coupling complementary strategy to flexible graph neural network for quick discovery of coformer in diverse co-crystal materials. Nat. Commun. 2021, 12, 5950.

6. Chong, Y.; Huo, Y.; Jiang, S.; et al. Machine learning of spectra-property relationship for imperfect and small chemistry data. Proc. Natl. Acad. Sci. U. S. A. 2023, 120, e2220789120.

7. Wang, X.; Jiang, S.; Hu, W.; et al. Quantitatively determining surface-adsorbate properties from vibrational spectroscopy with interpretable machine learning. J. Am. Chem. Soc. 2022, 144, 16069-76.

8. Ren, H.; Zhang, Q.; Wang, Z.; et al. Machine learning recognition of protein secondary structures based on two-dimensional spectroscopic descriptors. Proc. Natl. Acad. Sci. U. S. A. 2022, 119, e2202713119.

9. Chen, A.; Zhang, X.; Zhou, Z. Machine learning: accelerating materials development for energy storage and conversion. InfoMat 2020, 2, 553-76.

10. Sun, Z.; Yin, H.; Liu, K.; et al. Machine learning accelerated calculation and design of electrocatalysts for CO2 reduction. SmartMat 2022, 3, 68-83.

11. Lin, M.; Liu, X.; Xiang, Y.; et al. Unravelling the fast alkali-ion dynamics in paramagnetic battery materials combined with NMR and deep-potential molecular dynamics simulation. Angew. Chem. Int. Ed. Engl. 2021, 60, 12547-53.

12. Sanchez-Lengeling, B.; Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 2018, 361, 360-5.

13. Wang, A. Y.; Murdock, R. J.; Kauwe, S. K.; et al. Machine learning for materials scientists: an introductory guide toward best practices. Chem. Mater. 2020, 32, 4954-65.

14. Xu, P.; Ji, X.; Li, M.; Lu, W. Small data machine learning in materials science. npj. Comput. Mater. 2023, 9, 1000.

15. Dou, B.; Zhu, Z.; Merkurjev, E.; et al. Machine learning methods for small data challenges in molecular science. Chem. Rev. 2023, 123, 8736-80.

16. Guo, H.; Li, Y.; Shang, J.; Gu, M.; Huang, Y.; Gong, B. Learning from class-imbalanced data: review of methods and applications. Exp. Syst. Appl. 2017, 73, 220-39.

17. Xu, X.; Liang, T.; Zhu, J.; Zheng, D.; Sun, T. Review of classical dimensionality reduction and sample selection methods for large-scale data processing. Neurocomputing 2019, 328, 5-15.

18. Willett, P. Dissimilarity-based algorithms for selecting structurally diverse sets of compounds. J. Comput. Biol. 1999, 6, 447-57.

19. Pereira, T.; Abbasi, M.; Oliveira, J. L.; Ribeiro, B.; Arrais, J. Optimizing blood-brain barrier permeation through deep reinforcement learning for de novo drug design. Bioinformatics 2021, 37, i84-92.

20. Lu, T.; Li, H.; Li, M.; Wang, S.; Lu, W. Predicting experimental formability of hybrid organic-inorganic perovskites via imbalanced learning. J. Phys. Chem. Lett. 2022, 13, 3032-8.

21. Mazouin, B.; Schöpfer, A. A.; von Lilienfeld, O. A. Selected machine learning of HOMO-LUMO gaps with improved data-efficiency. Mater. Adv. 2022, 3, 8306-16.

22. Akdemir, D.; Sanchez, J. I.; Jannink, J. L. Optimization of genomic selection training populations with a genetic algorithm. Genet. Sel. Evol. 2015, 47, 38.

23. Miranda-Quintana, R. A.; Bajusz, D.; Rácz, A.; Héberger, K. Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics. J. Cheminform. 2021, 13, 32.

24. Miranda-Quintana, R. A.; Rácz, A.; Bajusz, D.; Héberger, K. Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 2: speed, consistency, diversity selection. J. Cheminform. 2021, 13, 33.

25. Ng, W. W. Y.; Yeung, D. S.; Cloete, I. Input sample selection for RBF neural network classification problems using sensitivity measure. In SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483), Washington, USA. Oct 08, 2023. IEEE; 2023. pp. 2593-8.

26. Smith, J. S.; Nebgen, B.; Lubbers, N.; Isayev, O.; Roitberg, A. E. Less is more: sampling chemical space with active learning. J. Chem. Phys. 2018, 148, 241733.

27. Botu, V.; Ramprasad, R. Adaptive machine learning framework to accelerate ab initio molecular dynamics. Int. J. Quantum. Chem. 2015, 115, 1074-83.

28. Gastegger, M.; Behler, J.; Marquetand, P. Machine learning molecular dynamics for the simulation of infrared spectra. Chem. Sci. 2017, 8, 6924-35.

29. Bergström, D.; Tiger, M.; Heintz, F. Bayesian optimization for selecting training and valida-tion data for supervised machine learning. In Proceedings of the 31st Annual Workshop of the Swedish Artificial Intelligence Society (SAIS 2019), Umeå, Sweden. Jun 18-19, 2019. https://www.ida.liu.se/divisions/aiics/publications/SAIS-2019-Bayesian-Optimization-Selecting.pdf. (accessed 11 Jun 2025).

30. Vaswani, A.; Shazeer, N.; Parmar, N. et al. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA. Curran Associates Inc.; 2017. pp. 6000-10. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf. (accessed 11 Jun 2025).

31. Ross, J.; Belgodere, B.; Chenthamarakshan, V.; Padhi, I.; Mroueh, Y.; Das, P. Large-scale chemical language representations capture molecular structure and properties. Nat. Mach. Intell. 2022, 4, 1256-64.

32. Lu, S.; Gao, Z.; He, D.; Zhang, L.; Ke, G. Data-driven quantum chemical property prediction leveraging 3D conformations with Uni-Mol. Nat. Commun. 2024, 15, 7104.

33. Eldar, Y.; Lindenbaum, M.; Porat, M.; Zeevi, Y. Y. The farthest point strategy for progressive image sampling. IEEE. Trans. Image. Process. 1997, 6, 1305-15.

34. Charles, R. Q.; Su, H.; Kaichun, M.; Guibas, L. J. PointNet: deep learning on point sets for 3D classification and segmentation. 2017. IEEE. Conference. on. Computer. Vision. and. Pattern. Recognition. (CVPR). , 2017. pp. 77-85.

35. Cersonsky, R. K.; Helfrecht, B. A.; Engel, E. A.; Kliavinek, S.; Ceriotti, M. Improving sample and feature selection with principal covariates regression. Mach. Learn. Sci. Technol. 2021, 2, 035038.

36. Yaws, C. L. Yaws’ critical property data for chemical engineers and chemists. Knovel; 2012. http://app.knovel.com/hotlink/toc/id:kpYCPDCECD/yaws-critical-property/yaws-critical-property. (accessed 11 Jun 2025).

37. PubChem. National Center for Biotechnology Information. https://pubchem.ncbi.nlm.nih.gov/. (accessed 11 Jun 2025).

38. RDKit: Open-source cheminformatics software. https://www.rdkit.org. (accessed 11 Jun 2025).

39. Mauri, A. alvaDesc: a tool to calculate and analyze molecular descriptors and fingerprints. In: Roy K, editor. Ecotoxicological QSARs. New York: Springer US; 2020. pp. 801-20.

40. Liu, Y.; Li, K.; Huang, J.; Yu, X.; Hu, W. Accurate prediction of the boiling point of organic molecules by multi-component heterogeneous learning model. Acta. Chim. Sin. 2022, 80, 714-23.

41. Bishop, C. M. Pattern recognition and machine learning. Springer: New York, NY; 2006. https://link.springer.com/book/9780387310732. (accessed 11 Jun 2025).

42. Viering, T.; Loog, M. The shape of learning curves: a review. IEEE. Trans. Pattern. Anal. Mach. Intell. 2023, 45, 7799-819.

43. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). Association for Computing Machinery; 2016. pp. 785-94.

44. He, H.; Garcia, E. A. Learning from Imbalanced Data. IEEE. Trans. Knowl. Data. Eng. 2009, 21, 1263-84.

45. van der Maaten, L.; Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res. , 2008, 2579-605. https://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf. (accessed 11 Jun 2025).

Journal of Materials Informatics
ISSN 2770-372X (Online)
Follow Us

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/