fig2

From: Farthest point sampling in property designated chemical feature space as an effective strategy for enhancing the machine learning model performance for small scale chemical dataset

Farthest point sampling in property designated chemical feature space as an effective strategy for enhancing the machine learning model performance for small scale chemical dataset

Figure 2. (A) MSE for training and test sets of ANN models built using the boiling point dataset under FPS (blue) and RS (gray) methods across different training sizes; (B) Distribution of the ΔMSE, i.e., difference between test set MSE and training set MSE, across varying training sizes. In the box plot, the box and whiskers represent the first and third quartiles, and 1.5 times the IQR, respectively. The dark dots represent the mean of ΔMSE. FPS exhibits lower ΔMSE values between training sizes of 0.2 to 0.7, indicating lower risks of overfitting; (C) Loss curves for the training and test sets of ANN models under FPS and RS at training sizes of 0.2 and 0.6. For the model by RS, larger shaded areas indicate higher variability, and higher losses in test sets suggest a greater risk of overfitting. MSE: Mean squared error; ANN: artificial neural network; FPS: farthest point sampling; RS: random sampling; IQR: interquartile range.

fig2

Committee on Publication Ethics

Portico

Committee on Publication Ethics

Portico