fig4
Figure 4. Complete workflow visualization generated by MatSci-ML Studio for the lung cancer prediction (classification) task. (A) Feature importance plot, where SMOKING is identified as the most influential factor. The importance of one-hot encoded features has been aggregated back to the original categorical feature; (B) Initial correlation matrix showing low inter-correlation among features; (C) Correlation matrix after filtering, demonstrating that most original features were retained; (D) Progress of Bayesian hyperparameter optimization maximizing the F1-score, which converges to a best value of approximately 0.9123; (E) Score distribution from the hyperparameter search; (F) Relationship between the min_samples_leaf hyperparameter and the CV score; (G) ROC curve for the final model on the test set, achieving an AUC of 0.917; (H) Precision-Recall curve, showing a high average precision (AP = 0.885); (I) Confusion matrix for the test set, detailing the classification performance (TN = 544, FP = 49, FN = 51, TP = 356). ML: Machine learning; CV: cross-validation; ROC: receiver operating characteristic; AUC: area under the curve.






