fig2
Figure 2. (A) MSE for training and test sets of ANN models built using the boiling point dataset under FPS (blue) and RS (gray) methods across different training sizes; (B) Distribution of the ΔMSE, i.e., difference between test set MSE and training set MSE, across varying training sizes. In the box plot, the box and whiskers represent the first and third quartiles, and 1.5 times the IQR, respectively. The dark dots represent the mean of ΔMSE. FPS exhibits lower ΔMSE values between training sizes of 0.2 to 0.7, indicating lower risks of overfitting; (C) Loss curves for the training and test sets of ANN models under FPS and RS at training sizes of 0.2 and 0.6. For the model by RS, larger shaded areas indicate higher variability, and higher losses in test sets suggest a greater risk of overfitting. MSE: Mean squared error; ANN: artificial neural network; FPS: farthest point sampling; RS: random sampling; IQR: interquartile range.