REFERENCES

1. Gormley, A. J.; bookb, M. A. Machine learning in combinatorial polymer chemistry. Nat. Rev. Mater. 2021, 6, 642-4.

2. Wang, S.; Yue, H.; Yuan, X. Accelerating polymer discovery with uncertainty-guided PGCNN: explainable AI for predicting properties and mechanistic insights. J. Chem. Inf. Model. 2024, 64, 5500-9.

3. Carvalho, R. P.; Marchiori, C. F. N.; Brandell, D.; Araujo, C. M. Artificial intelligence driven in-silico discovery of novel organic lithium-ion battery cathodes. Energy. Storage. Mater. 2022, 44, 313-25.

4. Bhowmik, A.; Berecibar, M.; Casas‐Cabanas, M.; et al. Implications of the BATTERY 2030+ AI‐assisted toolkit on future low‐TRL battery discoveries and chemistries. Adv. Energy. Mater. 2021, 12, 2102698.

5. Zhou, Z.; Shang, Y.; Liu, X.; Yang, Y. A generative deep learning framework for inverse design of compositionally complex bulk metallic glasses. npj. Comput. Mater. 2023, 9, 15.

6. Basu, B.; Gowtham, N. H.; Xiao, Y.; Kalidindi, S. R.; Leong, K. W. Biomaterialomics: data science-driven pathways to develop fourth-generation biomaterials. Acta. Biomater. 2022, 143, 1-25.

7. Singh, A. V.; Rosenkranz, D.; Ansari, M. H. D.; et al. Artificial intelligence and machine learning empower advanced biomedical material design to toxicity prediction. Adv. Intell. Syst. 2020, 2, 2000084.

8. Debnath, A.; Krajewski, A. M.; Sun, H.; et al. Generative deep learning as a tool for inverse design of high entropy refractory alloys. J. Mater. Inf. 2021, 1, 3.

9. Hart, G. L. W.; Mueller, T.; Toher, C.; Curtarolo, S. Machine learning for alloys. Nat. Rev. Mater. 2021, 6, 730-55.

10. Kononova, O.; He, T.; Huo, H.; Trewartha, A.; Olivetti, E. A.; Ceder, G. Opportunities and challenges of text mining in aterials research. iScience 2021, 24, 102155.

11. Sierepeklis, O.; Cole, J. M. A thermoelectric materials database auto-generated from the scientific literature using ChemDataExtractor. Sci. Data. 2022, 9, 648.

12. Kumar, P.; Kabra, S.; Cole, J. M. Auto-generating databases of Yield Strength and Grain Size using ChemDataExtractor. Sci. Data. 2022, 9, 292.

13. Wang, W.; Jiang, X.; Tian, S.; et al. Automated pipeline for superalloy data by text mining. npj. Comput. Mater. 2022, 8, 9.

14. Swain, M. C.; Cole, J. M. ChemDataExtractor: a toolkit for automated extraction of chemical information from the scientific literature. J. Chem. Inf. Model. 2016, 56, 1894-904.

15. Widiastuti, N. I. Convolution neural network for text mining and natural language processing. IOP. Conf. Ser. Mater. Sci. Eng. 2019, 662, 052010.

16. Devlin, J.; Chang, M. W.; Lee, K.; Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, USA, June 2, 2019; Association for Computational Linguistics; Vol. 1. pp. 4171-86.

17. Gupta, T.; Zaki, M.; Krishnan, N. M. A.; Mausam, . MatSciBERT: a materials domain language model for text mining and information extraction. npj. Comput. Mater. 2022, 8, 102.

18. Shetty, P.; Rajan, A. C.; Kuenneth, C.; et al. A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing. npj. Comput. Mater. 2023, 9, 52.

19. Lee, J.; Yoon, W.; Kim, S.; et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020, 36, 1234-40.

20. Huang, S.; Cole, J. M. BatteryBERT: a pretrained language model for battery database enhancement. J. Chem. Inf. Model. 2022, 62, 6365-77.

21. Brown, T. B.; Mann, B.; Ryder, N.; et al. Language models are few-shot learners. arXiv 2020, arXiv:2005.14165. Available online: https://doi.org/10.48550/arXiv.2005.14165. (accessed 23 Mar 2026).

22. OpenAI, Achiam, J.; Adler, S.; et al. GPT-4 technical report. arXiv 2023, arXiv:2303.08774. Available online: https://doi.org/10.48550/arXiv.2303.08774. (accessed 23 Mar 2026).

23. Touvron, H.; Lavril, T.; Izacard, G.; et al. LLaMA: open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. Available online: https://doi.org/10.48550/arXiv.2302.13971. (accessed 23 Mar 2026).

24. Chowdhery, A.; Narang, S.; Devlin, J.; et al. PaLM: scaling language modeling with pathways. arXiv 2022, arXiv:2204.02311. Available online: https://doi.org/10.48550/arXiv.2204.02311. (accessed 23 Mar 2026).

25. Team, G.; Anil, R.; Borgeaud, S.; et al. Gemini: a family of highly capable multimodal models. arXiv 2023, arXiv:2312.11805. Available online: https://doi.org/10.48550/arXiv.2312.11805. (accessed 23 Mar 2026).

26. Wei, J.; Bosma, M.; Zhao, V. Y.; et al. Finetuned language models are zero-shot learners. arXiv 2021, arXiv:2109.01652. Available online: https://doi.org/10.48550/arXiv.2109.01652. (accessed 23 Mar 2026).

27. Dagdelen, J.; Dunn, A.; Lee, S.; et al. Structured information extraction from scientific text with large language models. Nat. Commun. 2024, 15, 1418.

28. Vinyals, O.; Fortunato, M.; Jaitly, N. Pointer networks. In NIPS'15: Proceedings of the 29th International Conference on Neural Information Processing Systems, Montreal, Canada, December 7-12, 2015; MIT Press: Cambridge, Massachusetts, United States, 2015; Vol. 28. pp. 2692-700.

29. Vaswani, A.; Shazeer, N.; Parmar, N.; et al. Attention is all you need. arXiv 2017, arXiv:1706.03762. Available online: https://doi.org/10.48550/arXiv.1706.03762. (accessed 23 Mar 2026).

30. Sun, F.; Jiang, P.; Sun, H.; Pei, C.; Ou, W.; Wang, X. Multi-source pointer network for product title summarization. arXiv 2018, arXiv:1808.06885. Available online: https://doi.org/10.48550/arXiv.1808.06885. (accessed 23 Mar 2026).

31. Anthropic. The Claude 3 model family: opus, sonnet, haiku. https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf. (accessed 2026-03-23).

32. Gemini Team Google. Gemini 1.5: unlocking multimodal understanding across millions of tokens of context. arXiv 2024, arXiv:2403.05530. Available online: https://doi.org/10.48550/arXiv.2403.05530. (accessed 23 Mar 2026).

33. Grattafiori, A.; Dubey, A.; Jauhri, A.; et al. The Llama 3 herd of models. arXiv 2024, arXiv:2407.21783 Available online: https://doi.org/10.48550/arXiv.2407.21783. (accessed 23 Mar 2026).

34. Ji, Z.; Lee, N.; Frieske, R.; et al. Survey of hallucination in natural language generation. ACM. Comput. Surv. 2023, 55, 1-38.

35. Farquhar, S.; Kossen, J.; Kuhn, L.; Gal, Y. Detecting hallucinations in large language models using semantic entropy. Nature 2024, 630, 625-30.

36. Hei, M.; Liu, Q.; Zhang, X. Enhancing Information Extraction from Low-sample Materials Science Literature by Transfer Learning. In 2024 10th International Conference on Big Data and Information Analytics (BigDIA), Chiang Mai, Thailand, Oct 25-28, 2024; IEEE, 2024; pp. 736-41.