REFERENCES
1. Nolte, A.; Hayden, L. B.; Herbsleb, J. D. How to support newcomers in scientific hackathons - an action research study on expert mentoring. Proc. ACM. Hum. Comput. Interact. 2020, 4, 1-23.
2. Heller, B.; Amir, A.; Waxman, R.; Maaravi, Y. Hack your organizational innovation: literature review and integrative model for running hackathons. J. Innov. Entrep. 2023, 12, 6.
3. NC State. MATDAT18: materials and data science hackathon. https://matdat18.wordpress.ncsu.edu/. (accessed 2025-11-07).
4. Sparks, T. D.; Curtis, F. E.; Fredrickson, D. C.; Benedek, N. A. Insights and innovations from the SSMCDAT 2023: bridging solid-state materials chemistry and data science. Chem. Mater. 2024, 36, 5293-6.
5. University of Latvia. Hackathon 2022. https://www.quantumtheory.lu.lv/events/hackathon-2022/. (accessed 2025-11-07).
6. BO Hackathon with Acceleration Consortium. Hackathon agenda. https://ac-bo-hackathon.github.io/agenda/. (accessed 2025-11-07).
7. Jablonka, K. M.; Ai, Q.; Al-Feghali, A.; et al. 14 examples of how LLMs can transform materials science and chemistry: a reflection on a large language model hackathon. Digit. Discov. 2023, 2, 1233-50.
8. Zimmermann, Y.; Bazgir, A.; Afzal, Z.; et al. Reflections from the 2024 large language model (LLM) hackathon for applications in materials science and chemistry. arXiv 2024, arXiv:2411.15221. Available online: https://doi.org/10.48550/arXiv.2411.15221. (accessed 7 Nov 2025).
9. KRICT Chemical Data Explorer (ChemDX). https://www.chemdx.org/. (accessed 2025-11-07).
10. Horton, M. K.; Huck, P.; Yang, R. X.; et al. Accelerated data-driven materials science with the Materials Project. Nat. Mater. 2025, 24, 1522-32.
11. Scheidgen, M.; Himanen, L.; Ladines, A. N.; et al. NOMAD: a distributed web-based platform for managing materials science research data. J. Open. Source. Softw. 2023, 8, 5388.
12. ThermoElectric Materials Explorer (TEXplorer). https://texplorer.org/about. (accessed 2025-11-07).
13. Lee, Y. L.; Lee, H.; Jang, S.; et al. TEXplorer.org: thermoelectric material properties data platform for experimental and first-principles calculation results. APL. Mater. 2023, 11, 041111.
14. Yang, J. H.; Kang, H.; Kim, H. J.; et al. https://2DMat.ChemDX.org: experimental data platform for 2D materials from synthesis to physical properties. Digit. Discov. 2024, 3, 573-85.
15. Mok, D. H.; Back, S. Atomic structure-free representation of active motifs for expedited catalyst discovery. J. Chem. Inf. Model. 2021, 61, 4514-20.
16. Na, G. S.; Chang, H. A public database of thermoelectric materials and system-identified material representation for data-driven discovery. npj. Comput. Mater. 2022, 8, 897.
17. Jang, S.; Na, G. S.; Choi, Y.; Chang, H. Optical property dataset of inorganic phosphor. Sci. Rep. 2024, 14, 7639.
18. Lee, Y. L.; Lee, H.; Kim, T.; et al. Data-driven enhancement of ZT in SnSe-based thermoelectric systems. J. Am. Chem. Soc. 2022, 144, 13748-63.
19. Kim, J. S.; Chung, I.; Oh, J.; et al. Closed-loop optimization of catalysts for oxidative propane dehydrogenation with CO2 using artificial intelligence. J. CO2. Util. 2023, 78, 102620.
20. Kim, H. W.; Lee, S. W.; Na, G. S.; et al. Reaction condition optimization for non-oxidative conversion of methane using artificial intelligence. React. Chem. Eng. 2021, 6, 235-43.
21. Park, J.; Oh, J.; Kim, J.; et al. Catalyst discovery for propane dehydrogenation through interpretable machine learning: leveraging laboratory-scale database and atomic properties. ACS. Sustainable. Chem. Eng. 2024, 12, 10376-86.
22. Yang, J. H.; Lee, J.; Kwon, H.; Sohn, E. H.; Chang, H.; Jang, S. High glass transition temperature fluorinated polymers based on transfer learning with small experimental data. Macromol. Rapid. Commun. 2024, 45, e2400161.
23. Kim, J.; Noh, J.; Im, J. Machine learning-enabled chemical space exploration of all-inorganic perovskites for photovoltaics. npj. Comput. Mater. 2024, 10, 1270.
24. LitDX. DB’s visualization function. https://litdx.materials.chemdx.org/. (accessed 2025-11-07).
25. Solar Cell. DB’s visualization function. https://solar.chemdx.org/statistics. (accessed 2025-11-07).
27. Wang, A. Y.; Kauwe, S. K.; Murdock, R. J.; Sparks, T. D. Compositionally restricted attention-based network for materials property predictions. npj. Comput. Mater. 2021, 7, 545.
28. Prein, T.; Pan, E.; Dörr, T.; Olivetti, E.; Rupp, J. L. M. MTENCODER: a multi-task pretrained transformer encoder for materials representation learning. 2023. https://rgdoi.net/10.13140/RG.2.2.20897.79202. (accessed 7 Nov 2025).
29. Batatia, I.; Benner, P.; Chiang, Y.; et al. A foundation model for atomistic materials chemistry. arXiv 2024, arXiv:2401.00096. Available online: https://doi.org/10.48550/arXiv.2401.00096. (accessed 7 Nov 2025).
30. Park, Y.; Kim, J.; Hwang, S.; Han, S. Scalable parallel algorithm for graph neural network interatomic potentials in molecular dynamics simulations. J. Chem. Theory. Comput. 2024, 20, 4857-68.
31. ORB forcefield models from Orbital Materials. https://github.com/orbital-materials/orb-models. (accessed 2025-11-07).
32. ChemDX - LitDX. https://litdx.materials.chemdx.org. (accessed 2025-11-07).
33. Jablonka, K. M.; Schwaller, P.; Ortega-Guerrero, A.; Smit, B. Leveraging large language models for predictive chemistry. Nat. Mach. Intell. 2024, 6, 161-9.
34. Xie, Z.; Evangelopoulos, X.; Omar, ÖH.; Troisi, A.; Cooper, A. I.; Chen, L. Fine-tuning GPT-3 for machine learning electronic and functional properties of organic molecules. Chem. Sci. 2024, 15, 500-10.
35. Zhong, S.; Guan, X. Developing quantitative structure–activity relationship (QSAR) models for water contaminants’ activities/properties by fine-tuning GPT-3 models. Environ. Sci. Technol. Lett. 2023, 10, 872-7.
36. Kim, S.; Jung, Y.; Schrier, J. Large language models for inorganic synthesis predictions. J. Am. Chem. Soc. 2024, 146, 19654-9.
37. Song, Z.; Lu, S.; Ju, M.; Zhou, Q.; Wang, J. Is large language model all you need to predict the synthesizability and precursors of crystal structures? arXiv 2024, arXiv:2407.07016. Available online: https://doi.org/10.48550/arXiv.2407.07016. (accessed 7 Nov 2025).
38. Jacobs, R.; Polak, M. P.; Schultz, L. E.; Mahdavi, H.; Honavar, V.; Morgan, D. Regression with large language models for materials and molecular property prediction. arXiv 2024, arXiv:2409.06080. Available online: https://doi.org/10.48550/arXiv.2409.06080. (accessed 7 Nov 2025).
39. Rubungo, A. N.; Li, K.; Hattrick-Simpers, J.; Dieng, A. B. LLM4Mat-bench: benchmarking large language models for materials property prediction. Mach. Learn. Sci. Technol. 2025, 6, 020501.
40. Van Herck, J.; Gil, M. V.; Jablonka, K. M.; et al. Assessment of fine-tuned large language models for real-world chemistry and material science applications. Chem. Sci. 2025, 16, 670-84.
41. Sayeed, H. M.; Baird, S. G.; Sparks, T. D. Structure feature vectors derived from Robocrystallographer text descriptions of crystal structures using word embeddings. ChemRxiv 2023. Available online: http://dx.doi.org/10.26434/chemrxiv-2023-3q8wj. (accessed 7 Nov 2025).
42. Kim, S.; Schrier, J.; Jung, Y. Explainable synthesizability prediction of inorganic crystal polymorphs using large language models. Angew. Chem. Int. Ed. 2025, 64, e202423950.
43. Reimers, N.; Gurevych, I. Sentence-BERT: sentence embeddings using siamese BERT-networks. arXiv 2019, arXiv:1908.10084. Available online: https://doi.org/10.48550/arXiv.1908.10084. (accessed 7 Nov 2025).
44. Grattafiori, A.; Dubey, A.; Jauhri, A.; et al. The Llama 3 herd of models. arXiv 2024, arXiv:2407.21783. Available online: https://doi.org/10.48550/arXiv.2407.21783. (accessed 7 Nov 2025).
45. OpenAI Platform. GPT-4o mini. https://platform.openai.com/docs/models#gpt-4o-mini. (accessed 2025-11-07).
46. Baird, S.; Ansari, M.; Afzal, Z.; et al. Bayesian optimization hackathon for chemistry and materials. ChemRxiv 2025. Available online: https://doi.org/10.26434/chemrxiv-2025-dzh5z. (accessed 7 Nov 2025).
47. Ottomano, F.; De Felice, G.; Gusev, V. V.; Sparks, T. D. Not as simple as we thought: a rigorous examination of data aggregation in materials informatics. Digit. Discov. 2024, 3, 337-46.
48. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825-30. http://jmlr.org/papers/v12/pedregosa11a.html. (accessed 7 Nov 2025).
49. Hocky, G. M.; White, A. D. Natural language processing models that automate programming will transform chemistry research and teaching. Digit. Discov. 2022, 1, 79-83.
50. White, A. D.; Hocky, G. M.; Gandhi, H. A.; et al. Assessment of chemistry knowledge in large language models that generate code. Digit. Discov. 2023, 2, 368-76.
51. Microsoft Research AI4Science, Microsoft Azure Quantum. The impact of large language models on scientific discovery: a preliminary study using GPT-4. arXiv 2023, arXiv:2311.07361. Available online: https://doi.org/10.48550/arXiv.2311.07361. (accessed 7 Nov 2025).
52. Hare, P. M. Coding with AI in the Physical Chemistry Laboratory. J. Chem. Educ. 2024, 101, 3869-74.
53. Coudert, F. Reproducible research in computational chemistry of materials. Chem. Mater. 2017, 29, 2615-7.
54. Persaud, D.; Ward, L.; Hattrick-Simpers, J. Reproducibility in materials informatics: lessons from ‘A general-purpose machine learning framework for predicting properties of inorganic materials’. Digit. Discov. 2024, 3, 281-6.
55. Butler, K. T.; Choudhary, K.; Csanyi, G.; Ganose, A. M.; Kalinin, S. V.; Morgan, D. Setting standards for data driven materials science. npj. Comput. Mater. 2024, 10, 1411.
56. McDowell, D. L. Gaps and barriers to successful integration and adoption of practical materials informatics tools and workflows. JOM 2021, 73, 138-48.
57. The Minerals, Metals & Materials Society. MGI workforce. https://www.tms.org/MGIworkforce. (accessed 2025-11-07).
58. Wang, Z.; Chen, A.; Tao, K.; et al. AlphaMat: a material informatics hub connecting data, features, models and applications. npj. Comput. Mater. 2023, 9, 1086.





