fig4

MatSci-ML Studio: an interactive workflow toolkit for automated machine learning in materials science

Figure 4. Complete workflow visualization generated by MatSci-ML Studio for the lung cancer prediction (classification) task. (A) Feature importance plot, where SMOKING is identified as the most influential factor. The importance of one-hot encoded features has been aggregated back to the original categorical feature; (B) Initial correlation matrix showing low inter-correlation among features; (C) Correlation matrix after filtering, demonstrating that most original features were retained; (D) Progress of Bayesian hyperparameter optimization maximizing the F1-score, which converges to a best value of approximately 0.9123; (E) Score distribution from the hyperparameter search; (F) Relationship between the min_samples_leaf hyperparameter and the CV score; (G) ROC curve for the final model on the test set, achieving an AUC of 0.917; (H) Precision-Recall curve, showing a high average precision (AP = 0.885); (I) Confusion matrix for the test set, detailing the classification performance (TN = 544, FP = 49, FN = 51, TP = 356). ML: Machine learning; CV: cross-validation; ROC: receiver operating characteristic; AUC: area under the curve.

Journal of Materials Informatics
ISSN 2770-372X (Online)
Follow Us

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/