Revisiting Machine Learning Datasets - Concrete Compressive Strength

Today, we will have a look at this dataset on Concrete Compressive Strength by Yeh (1998) [1] as part of my “Exploring Less Known Datasets for Machine Learning” series. Let us see how state-of-the-art algorithms compare with the results from 1998. (My views on this dataset are entirely based on Yeh (1998)).

Contents

Dataset, baseline results from Yeh (1998) and some domain knowledge
Dataset exploration and preprocessing
- Basic dataset properties
- Feature engineering
Applying classical machine learning algorithms
Applying Deep Neural Networks
Discussion of results

Dataset, baseline results from Yeh (1998) and some domain knowledge

Legal

dataset © Prof. I-Cheng Yeh [1]; published in UCI Machine Learning Repository [2].

The aim of the dataset is to predict concrete compressive strength of high performance concrete (HPC). HPC does not always means high strength but covers all kinds of concrete for special applications that are not possible with standard concretes. Therefore, our target value is:

target y

Concrete compressive strength [MPa]

In this case the compressive strength is the cylindrical compressive strength meaning a cylindrical sample (15 cm diameter; 30 cm height) was used for testing. The value is a bit smaller than testing on cubic samples. Both tests assess the uniaxial compressive strength. Usually, we get both values if we buy concrete.

To predict compressive strengths, we have these features available:

input X

Cement \(\left[\frac{kg}{m^3}\right]\)
Fly ash \(\left[\frac{kg}{m^3}\right]\)
Blast furnance slag \(\left[\frac{kg}{m^3}\right]\)
Water \(\left[\frac{kg}{m^3}\right]\)
Superplasticizer \(\left[\frac{kg}{m^3}\right]\)
Fine aggregate \(\left[\frac{kg}{m^3}\right]\)
Coarse aggregate \(\left[\frac{kg}{m^3}\right]\)
Age \([d]\)

Let us have a closer look at some of the features:

Age

Concrete hardens with time and strength increases. Usually, concrete is tested after 28 days.

Fly ash and slag

Both features can be considered as binder together with cement. Both increase strength and durability of concrete. However, the hardening process takes longer and therefore it requires more time to reach full compressive strength.

Superplasticizer

Superplasticizer are used to ensure better flow properties because they minimize particle segregation. Further, they allow to decrease the water-cement ratio which leads to higher compressive strength.

Since it is a small dataset, Yeh (1998) selected training and testing sets manually as well as randomly. In the manual splitting the dataset was splitted into 4 sets and were used in a “k-fold style”.

The handpicked sets show signs of overfitting:

R²_training: 0.917 - 0.935
R²_testing: 0.814 - 0.895

The random sets show less signs signs of overfitting and better overall performance:

R²_training: 0.923 - 0.945
R²_testing: 0.908 - 0.922.

Dataset exploration and preprocessing

Basic dataset properties

Let us load the dataset (xls file) using pandas and have a look at the data:

I cleaned the header a bit to make more accessible.

filepath_input_data = './data/Concrete_Data_clean_header.xls'
input_data = pd.read_excel(filepath_input_data)
display(input_data.head(5))


  
    
      
      Cement
      Blast Furnace Slag
      Fly Ash
      Water
      Superplasticizer
      Coarse Aggregate
      Fine Aggregate
      Age
      Concrete compressive strength
    
  
  
    
      0
      540.0
      0.0
      0.0
      162.0
      2.5
      1040.0
      676.0
      28
      79.986111
    
    
      1
      540.0
      0.0
      0.0
      162.0
      2.5
      1055.0
      676.0
      28
      61.887366
    
    
      2
      332.5
      142.5
      0.0
      228.0
      0.0
      932.0
      594.0
      270
      40.269535
    
    
      3
      332.5
      142.5
      0.0
      228.0
      0.0
      932.0
      594.0
      365
      41.052780
    
    
      4
      198.6
      132.4
      0.0
      192.0
      0.0
      978.4
      825.5
      360
      44.296075

	Cement	Blast Furnace Slag	Water	Superplasticizer	Coarse Aggregate	Fine Aggregate	Age	Concrete compressive strength
0	540.0	0.0	162.0	2.5	1040.0	676.0	28	79.986111
1	540.0	0.0	162.0	2.5	1055.0	676.0	28	61.887366
2	332.5	142.5	228.0	0.0	932.0	594.0	270	40.269535
3	332.5	142.5	228.0	0.0	932.0	594.0	365	41.052780
4	198.6	132.4	192.0	0.0	978.4	825.5	360	44.296075

display(input_data.tail(5))


  
    
      
      Cement
      Blast Furnace Slag
      Fly Ash
      Water
      Superplasticizer
      Coarse Aggregate
      Fine Aggregate
      Age
      Concrete compressive strength
    
  
  
    
      1025
      276.4
      116.0
      90.3
      179.6
      8.9
      870.1
      768.3
      28
      44.284354
    
    
      1026
      322.2
      0.0
      115.6
      196.0
      10.4
      817.9
      813.4
      28
      31.178794
    
    
      1027
      148.5
      139.4
      108.6
      192.7
      6.1
      892.4
      780.0
      28
      23.696601
    
    
      1028
      159.1
      186.7
      0.0
      175.6
      11.3
      989.6
      788.9
      28
      32.768036
    
    
      1029
      260.9
      100.5
      78.3
      200.6
      8.6
      864.5
      761.5
      28
      32.401235

	Cement	Blast Furnace Slag	Fly Ash	Water	Superplasticizer	Coarse Aggregate	Fine Aggregate	Age	Concrete compressive strength
1025	276.4	116.0	90.3	179.6	8.9	870.1	768.3	28	44.284354
1026	322.2	0.0	115.6	196.0	10.4	817.9	813.4	28	31.178794
1027	148.5	139.4	108.6	192.7	6.1	892.4	780.0	28	23.696601
1028	159.1	186.7	0.0	175.6	11.3	989.6	788.9	28	32.768036
1029	260.9	100.5	78.3	200.6	8.6	864.5	761.5	28	32.401235

input_data.describe()


  
    
      
      Cement
      Blast Furnace Slag
      Fly Ash
      Water
      Superplasticizer
      Coarse Aggregate
      Fine Aggregate
      Age
      Concrete compressive strength
    
  
  
    
      count
      1030.000000
      1030.000000
      1030.000000
      1030.000000
      1030.000000
      1030.000000
      1030.000000
      1030.000000
      1030.000000
    
    
      mean
      281.165631
      73.895485
      54.187136
      181.566359
      6.203112
      972.918592
      773.578883
      45.662136
      35.817836
    
    
      std
      104.507142
      86.279104
      63.996469
      21.355567
      5.973492
      77.753818
      80.175427
      63.169912
      16.705679
    
    
      min
      102.000000
      0.000000
      0.000000
      121.750000
      0.000000
      801.000000
      594.000000
      1.000000
      2.331808
    
    
      25%
      192.375000
      0.000000
      0.000000
      164.900000
      0.000000
      932.000000
      730.950000
      7.000000
      23.707115
    
    
      50%
      272.900000
      22.000000
      0.000000
      185.000000
      6.350000
      968.000000
      779.510000
      28.000000
      34.442774
    
    
      75%
      350.000000
      142.950000
      118.270000
      192.000000
      10.160000
      1029.400000
      824.000000
      56.000000
      46.136287
    
    
      max
      540.000000
      359.400000
      200.100000
      247.000000
      32.200000
      1145.000000
      992.600000
      365.000000
      82.599225

	Cement	Blast Furnace Slag	Fly Ash	Water	Superplasticizer	Coarse Aggregate	Fine Aggregate	Age	Concrete compressive strength
count	1030.000000	1030.000000	1030.000000	1030.000000	1030.000000	1030.000000	1030.000000	1030.000000	1030.000000
mean	281.165631	73.895485	54.187136	181.566359	6.203112	972.918592	773.578883	45.662136	35.817836
std	104.507142	86.279104	63.996469	21.355567	5.973492	77.753818	80.175427	63.169912	16.705679
min	102.000000	0.000000	0.000000	121.750000	0.000000	801.000000	594.000000	1.000000	2.331808
25%	192.375000	0.000000	0.000000	164.900000	0.000000	932.000000	730.950000	7.000000	23.707115
50%	272.900000	22.000000	0.000000	185.000000	6.350000	968.000000	779.510000	28.000000	34.442774
75%	350.000000	142.950000	118.270000	192.000000	10.160000	1029.400000	824.000000	56.000000	46.136287
max	540.000000	359.400000	200.100000	247.000000	32.200000	1145.000000	992.600000	365.000000	82.599225

A scatterplot helps us to understand discrete distributions of each feature and provides us with the correlation coefficent (Pearson).

scatterplot_matrix  = pd.plotting.scatter_matrix(input_data, alpha=0.2, figsize=(17, 17), diagonal='kde')
corr = input_data.corr().as_matrix()
for i,j in zip (*plt.np.triu_indices_from(scatterplot_matrix, k=1)):
    scatterplot_matrix[i,j].annotate('%.4f' %corr[i,j], (0.5,0.5), xycoords='axes fraction', ha='center', va='center')
plt.tight_layout()
plt.show()

The scatterplot matrix shows us that there are some features such as ‘Age’ are less evenly distributed than other features. Using boxplots tell us a bit more about our data:

Since most of our machine learning algorithms assume that we have normalized data, we can apply MaxAbsScaler to scale them.

Feature engineering

A key factor in concrete engineering is the water-cement ratio.

Yeh (1998) used

Water to cement
Water to binder
Superplasticizer to binder
Fly ash to binder
Slag to binder
Fly ash + slag to binder

This raises a few questions. Mainly: what exactly is binder?

With this dataset we cannot reproduce any values of the ratios that Yeh (1998) used. It is also not clear what features he finally used. Did he used only a part of them? Only ratio and age? This is only documented properly for (manually designed) linear regerssion models but not for the neural networks.

Water to cement ratio

Water to cement seems to be the simplest:

\[\text{w/c} = \frac{\text{Water} \left[\frac{kg}{m^3}\right]}{\text{Cement} \left[\frac{kg}{m^3}\right]}\]

# pure water-cement ratio feature
input_features_unscaled_with_wcratio = input_data.copy()
input_features_unscaled_with_wcratio.insert(input_features_unscaled_with_wcratio.shape[-1]-1,'water-cement-ratio', input_features_unscaled_with_wcratio['Water ']/input_features_unscaled_with_wcratio['Cement'])
input_features_unscaled_with_wcratio.drop(['Water ', 'Cement'], axis=1, inplace=True)

# scale it!
scaler = MaxAbsScaler()
input_features_unscaled_with_wcratio_scaled = input_features_unscaled_with_wcratio.copy()
input_features_unscaled_with_wcratio_sc = scaler.fit_transform(input_features_unscaled_with_wcratio)
input_features_unscaled_with_wcratio_scaled.loc[:,:] = input_features_unscaled_with_wcratio_sc

# making the scaling function more accessible
extract_scaling_function = np.ones((1,input_features_unscaled_with_wcratio.shape[1]))
extract_scaling_function = scaler.inverse_transform(extract_scaling_function)

# split data into X and y
y_df = input_features_unscaled_with_wcratio_scaled['Concrete compressive strength'].copy()
X_df = input_features_unscaled_with_wcratio_scaled.copy()
X_df.drop('Concrete compressive strength', axis=1, inplace=True)

# add it to the dict of datasets for fast and iterative testing
X_train, X_test, y_train, y_test = train_test_split(X_df.values, y_df.values,test_size=0.2, random_state=42, shuffle=True)
comment = 'pure water-cement-ratio; 7 inputs, 1 output'
datasets[dataset_id] = {'X_train': X_train, 'X_test' : X_test, 'y_train': y_train, 'y_test' : y_test, 'scaler' : scaler, 'scaler_array' : extract_scaling_function, 'comment' : comment, 'dataset' : dataset_id}
dataset_id += 1

input_features_unscaled_with_wcratio['water-cement-ratio'].head()

  0.300000
  0.300000
  0.685714
  0.685714
  0.966767
Name: water-cement-ratio, dtype: float64

input_features_unscaled_with_wcratio['water-cement-ratio'].describe()

count    1030.000000
mean        0.748269
std         0.314005
min         0.266893
25%         0.533333
50%         0.675349
75%         0.935014
max         1.882353
Name: water-cement-ratio, dtype: float64

Roughly 300 datapoints were added compared to the original publication. There the water-cement ratio was described as:

min  0.24
mean 0.97
max  2.73

This raises a question: what has happened? Different min and mean are explainable but not a lower max value.

Water to binder ratio

Depending on the definition we could count fly ash and slag as binder as well. Usually they are weighted with k-values which we do not have. Hence, we have to try without them:

\[\text{w/b} = \frac{\text{Water} \left[\frac{kg}{m^3}\right]}{\text{Cement} \left[\frac{kg}{m^3}\right] + \text{Fly ash} \left[\frac{kg}{m^3}\right] + \text{Blast furnance slag} \left[\frac{kg}{m^3}\right]}\]

input_features_unscaled_with_wcratio_full = input_data.copy()
input_features_unscaled_with_wcratio_full.insert(input_features_unscaled_with_wcratio_full.shape[-1]-1,'water-cement-ratio_with_slag_ash',input_features_unscaled_with_wcratio_full['Water ']/(input_features_unscaled_with_wcratio_full['Cement'] + input_features_unscaled_with_wcratio_full['Fly Ash'] + input_features_unscaled_with_wcratio_full['Blast Furnace Slag ']))
input_features_unscaled_with_wcratio_full.drop(['Water ', 'Cement', 'Fly Ash', 'Blast Furnace Slag '], axis=1, inplace=True)

scaler = MaxAbsScaler()
input_features_unscaled_with_wcratio_full_scaled = input_features_unscaled_with_wcratio_full.copy()
input_features_unscaled_with_wcratio_full_sc = scaler.fit_transform(input_features_unscaled_with_wcratio_full)
input_features_unscaled_with_wcratio_full_scaled.loc[:,:] = input_features_unscaled_with_wcratio_full_sc
extract_scaling_function = np.ones((1,input_features_unscaled_with_wcratio_full.shape[1]))
extract_scaling_function = scaler.inverse_transform(extract_scaling_function)

# split data into X and y
y_df = input_features_unscaled_with_wcratio_full_scaled['Concrete compressive strength'].copy()
X_df = input_features_unscaled_with_wcratio_full_scaled.copy()
X_df.drop('Concrete compressive strength', axis=1, inplace=True)


X_train, X_test, y_train, y_test = train_test_split(X_df.values, y_df.values,test_size=0.2, random_state=42, shuffle=True)
comment = 'water-cement-ratio (binder?); 5 inputs, 1 output'
datasets[dataset_id] = {'X_train': X_train, 'X_test' : X_test, 'y_train': y_train, 'y_test' : y_test, 'scaler' : scaler, 'scaler_array' : extract_scaling_function, 'comment' : comment, 'dataset' : dataset_id}

input_features_unscaled_with_wcratio_full['water-cement-ratio_with_slag_ash'].head()

  0.30000
  0.30000
  0.48000
  0.48000
  0.58006
Name: water-cement-ratio_with_slag_ash, dtype: float64

input_features_unscaled_with_wcratio_full['water-cement-ratio_with_slag_ash'].describe()

count    1030.000000
mean        0.469236
std         0.127127
min         0.235073
25%         0.383916
50%         0.472118
75%         0.561244
max         0.900000
Name: water-cement-ratio_with_slag_ash, dtype: float64

Applying classical machine learning algorithms

When it comes to applying classical machine learning algorithms to relatively small datasets (low amount of features, not too many data points), then we may try to apply a brute-force approach to find suitable models. In this case we can use GridSearchCV from scikit-learn to train and test various hyperparameters for the following algorithms (4-fold cross-validation to keep validation and test set sizes similar: both 0.2 * dataset size):

Linear Regression
Decision Tree Regression
SVM Regression
Random Forest Regression
AdaBoost Regression
XGBoost Regression

In this case using the mean-squared error as a scorer is appropriate. However, for later interpretation we have to re-calculate our metrics on the unscaled data as well since it resembles a physical value. Therefore, our regression looks like this:

def xgboost_regression(X_train, X_test, y_train, y_test,scorer,dataset_id):
    x_gradient_boosting_regression = xgboost.XGBRegressor(random_state=42)
    grid_parameters_x_gradient_boosting_regression = {'n_estimators' : [3,5,10,15,18,20,25,50,60,80,100,120,150,200,300],
                                                    'max_depth' : [1,2, 3,5,7,9,10,11,15],
                                                    'learning_rate' :[ 0.0001, 0.001, 0.01, 0.1, 0.15, 0.2, 0.8, 1.0],
                                                     }
    start_time = time.time()
    grid_obj = GridSearchCV(x_gradient_boosting_regression, param_grid=grid_parameters_x_gradient_boosting_regression, cv=kfold_vs_size, n_jobs=-1, scoring=scorer, verbose=0)
    grid_fit = grid_obj.fit(X_train, y_train)
    training_time = time.time() - start_time
    best_x_gradient_boosting_regression = grid_fit.best_estimator_
    prediction = best_x_gradient_boosting_regression.predict(X_test)
    r2 = r2_score(y_test, prediction)
    mse = mean_squared_error(y_test, prediction)
    mae = mean_absolute_error(y_true=y_test, y_pred=prediction)
    
    # metrics for true values
    # r2 remains unchanged, mse, mea will change and cannot be scaled
    # because there is some physical meaning behind it
    prediction_true_scale = prediction * datasets[dataset_id]['scaler_array'][:,-1]
    y_test_true_scale = y_test * datasets[dataset_id]['scaler_array'][:,-1]
    mae_true_scale = mean_absolute_error(y_true=y_test_true_scale, y_pred=prediction_true_scale)
    medae_true_scale = median_absolute_error(y_true=y_test_true_scale, y_pred=prediction_true_scale)
    mse_true_scale = mean_squared_error(y_true=y_test_true_scale, y_pred=prediction_true_scale)
    
    return {'Regression type' : 'eXtreme Gradient Boosting Regression', 'model' : grid_fit, 'Predictions' : prediction, 'R2' : r2, 'MSE' : mse, 'MAE' : mae, 'MSE_true_scale' : mse_true_scale, 'RMSE_true_scale' : np.sqrt(mse_true_scale), 'MAE_true_scale' : mae_true_scale, 'MedAE_true_scale' : medae_true_scale ,'Training time' : training_time, 'dataset' : dataset_id}

This leads to the following results:

Dataset 0 (no features added or dropped)


  
    
      Regression type
      R2
      MSE
      MAE
      MSE_true_scale
      RMSE_true_scale
      MAE_true_scale
      MedAE_true_scale
    
  
  
    
      Linear Regression
      0.628008
      0.014050
      0.093831
      95.855234
      9.790569
      7.750363
      6.208384
    

    
      Decision Tree Regression
      0.809726
      0.007186
      0.056071
      49.029968
      7.002140
      4.631395
      3.137805
    
    
      SVM Regression
      0.796093
      0.007701
      0.068799
      52.542968
      7.248653
      5.682717
      4.554011
    
    
      Random Forest Regression
      0.875810
      0.004690
      0.048152
      32.001287
      5.656968
      3.977335
      2.662481
    
    
      AdaBoost Regression
      0.775230
      0.008489
      0.076112
      57.918903
      7.610447
      6.286776
      6.055077
    
    
      eXtreme Gradient Boosting Regression
      0.921480
      0.002966
      0.037675
      20.233235
      4.498137
      3.111918
      1.982341

Regression type	R2	MSE	MAE	MSE_true_scale	RMSE_true_scale	MAE_true_scale	MedAE_true_scale
Linear Regression	0.628008	0.014050	0.093831	95.855234	9.790569	7.750363	6.208384
Decision Tree Regression	0.809726	0.007186	0.056071	49.029968	7.002140	4.631395	3.137805
SVM Regression	0.796093	0.007701	0.068799	52.542968	7.248653	5.682717	4.554011
Random Forest Regression	0.875810	0.004690	0.048152	32.001287	5.656968	3.977335	2.662481
AdaBoost Regression	0.775230	0.008489	0.076112	57.918903	7.610447	6.286776	6.055077
eXtreme Gradient Boosting Regression	0.921480	0.002966	0.037675	20.233235	4.498137	3.111918	1.982341

Dataset 1 (water-cement ratio added, ‘Water’ and ‘Cement’ dropped)


  
    
      Regression type
      R2
      MSE
      MAE
      MSE_true_scale
      RMSE_true_scale
      MAE_true_scale
      MedAE_true_scale
    
  
  
    
      Linear Regression
      0.583423
      0.015734
      0.098371
      107.344074
      10.360699
      8.125363
      6.605434
    
    
      Decision Tree Regression
      0.884075
      0.004378
      0.046782
      29.871788
      5.465509
      3.864138
      2.758249
    
    
      SVM Regression
      0.792823
      0.007825
      0.068682
      53.385716
      7.306553
      5.673112
      4.536345
    
    
      Random Forest Regression
      0.896781
      0.003898
      0.044800
      26.597586
      5.157285
      3.700447
      2.627287
    
    
      AdaBoost Regression
      0.761645
      0.009002
      0.076656
      61.419547
      7.837062
      6.331761
      5.468500
    
    
      eXtreme Gradient Boosting Regression
      0.942883
      0.002157
      0.033113
      14.718070
      3.836414
      2.735069
      1.980554

Regression type	R2	MSE	MAE	MSE_true_scale	RMSE_true_scale	MAE_true_scale	MedAE_true_scale
Linear Regression	0.583423	0.015734	0.098371	107.344074	10.360699	8.125363	6.605434
Decision Tree Regression	0.884075	0.004378	0.046782	29.871788	5.465509	3.864138	2.758249
SVM Regression	0.792823	0.007825	0.068682	53.385716	7.306553	5.673112	4.536345
Random Forest Regression	0.896781	0.003898	0.044800	26.597586	5.157285	3.700447	2.627287
AdaBoost Regression	0.761645	0.009002	0.076656	61.419547	7.837062	6.331761	5.468500
eXtreme Gradient Boosting Regression	0.942883	0.002157	0.033113	14.718070	3.836414	2.735069	1.980554

Dataset 1 (water-binder ratio added, ‘Water’,’Cement’,’Slag’ and ‘Fly ash’ dropped)


  
    
      Regression type
      R2
      MSE
      MAE
      MSE_true_scale
      RMSE_true_scale
      MAE_true_scale
      MedAE_true_scale
    
  
  
    
      Linear Regression
      0.616510
      0.014484
      0.098235
      98.818192
      9.940734
      8.114129
      7.272473
    
    
      Decision Tree Regression
      0.809843
      0.007182
      0.056120
      48.999786
      6.999985
      4.635475
      2.871331
    
    
      SVM Regression
      0.780538
      0.008289
      0.069751
      56.551185
      7.520052
      5.761395
      4.419597
    
    
      Random Forest Regression
      0.888769
      0.004201
      0.044238
      28.662208
      5.353710
      3.653984
      2.216723
    
    
      AdaBoost Regression
      0.759691
      0.009076
      0.076208
      61.923092
      7.869123
      6.294741
      5.834458
    
    
      eXtreme Gradient Boosting Regression
      0.915372
      0.003196
      0.038303
      21.807015
      4.669798
      3.163834
      2.303538

Regression type	R2	MSE	MAE	MSE_true_scale	RMSE_true_scale	MAE_true_scale	MedAE_true_scale
Linear Regression	0.616510	0.014484	0.098235	98.818192	9.940734	8.114129	7.272473
Decision Tree Regression	0.809843	0.007182	0.056120	48.999786	6.999985	4.635475	2.871331
SVM Regression	0.780538	0.008289	0.069751	56.551185	7.520052	5.761395	4.419597
Random Forest Regression	0.888769	0.004201	0.044238	28.662208	5.353710	3.653984	2.216723
AdaBoost Regression	0.759691	0.009076	0.076208	61.923092	7.869123	6.294741	5.834458
eXtreme Gradient Boosting Regression	0.915372	0.003196	0.038303	21.807015	4.669798	3.163834	2.303538

First, using XGBoost on dataset 1 outperforms all neural networks presented in the original paper by Yeh (1998). Moreover, XGBoost reaches performance similar to the neural networks on all three datasets. We should not be surprise that dataset 2 does not perform well. As mentioned earlier the water-binder ratio is a wild guess on what they might have used.

Further, Random Forest regression performs in the range of the neural network if the neural network was used on handpicked training and testing sets.

Applying Deep Neural Networks

Do we really need neural networks? From a time investment perspective this questionable however not from a perspective of having fun ;). We can use Keras to build our DNNs since this is a small and simple dataset and therefore we do not need to optimize much for speed.

Neural Networks similar to Yeh (1998)

Yeh writes in his 1998 paper that he uses 1 layer with 8 units (for 8 input features?) and one output layer. Further, he used a learning rate of 1.0 and a “momentum factor” of 0.5 and 3000 iterations. It is unclear if any cross-validation was used. Unfortunately, that is all that was published on the neural network. I guess this is as close as we can get to it:

# simple model close to Yeh (1998)

def build_baseline_model(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='sigmoid'))
    model.add(Dense(1))
    sgd = optimizers.SGD(lr=1.0, momentum=0.5)
    model.compile(loss='mean_squared_error', optimizer=sgd, metrics=['mae'])
    return model

It is difficult to assume a useful bath size. Perhaps unlimited is appropriate since the original work was carried out in 1998. Let us start with batch_size = 16:

# example of how the training process looks like for all neural networks
def run_baseline_model(X_train, X_test, y_train, y_test, dataset_id, epochs=3000,validation_split=0.2, batch_size=16):
    model = build_baseline_model(datasets[dataset_id]['X_train'].shape[1])
    
    checkpoint = callbacks.ModelCheckpoint('keras_models/model_baseline_model_'+str(dataset)+'_best.hdf5', monitor='val_loss', save_best_only=True, save_weights_only=True)
    start_time = time.time()
    model.fit(X_train, y_train,callbacks=[checkpoint], batch_size=batch_size, epochs=epochs, validation_split=validation_split)
    training_time = time.time() - start_time
    
    # load best model
    model.load_weights('keras_models/model_baseline_model_'+str(dataset)+'_best.hdf5') 
    prediction = model.predict(X_test)
    r2 = r2_score(y_test, prediction)
    mse = mean_squared_error(y_test, prediction)
    mae = mean_absolute_error(y_true=y_test, y_pred=prediction)
    
    # metrics for true values
    # r2 remains unchanged, mse, mea will change and cannot be scaled
    # because there is some physical meaning behind it
    prediction_true_scale = prediction * datasets[dataset_id]['scaler_array'][:,-1]
    y_test_true_scale = y_test * datasets[dataset_id]['scaler_array'][:,-1]
    mae_true_scale = mean_absolute_error(y_true=y_test_true_scale, y_pred=prediction_true_scale)
    medae_true_scale = median_absolute_error(y_true=y_test_true_scale, y_pred=prediction_true_scale)
    mse_true_scale = mean_squared_error(y_true=y_test_true_scale, y_pred=prediction_true_scale)
    
    
    return {'Regression type' : 'Baseline NN', 'model' : model, 'Predictions' : prediction, 'R2' : r2, 'MSE' : mse, 'MAE' : mae, 'MSE_true_scale' : mse_true_scale, 'RMSE_true_scale' : np.sqrt(mse_true_scale), 'MAE_true_scale' : mae_true_scale, 'MedAE_true_scale' : medae_true_scale ,'Training time' : training_time, 'dataset' : dataset_id}

This model yields total chaos:


  
    
      R2
      MSE
      MAE
      MSE_true_scale
      RMSE_true_scale
      MAE_true_scale
      MedAE_true_scale
      Dataset
    
  
  
    
      -0.001312
      0.037818
      0.158211
      258.019144
      16.062974
      13.068101
      12.093447
      0
    
    
      0.594574
      0.015312
      0.101047
      104.470664
      10.221089
      8.346391
      7.861097
      1
    
    
      -0.001457
      0.037824
      0.158224
      258.056630
      16.064141
      13.069158
      12.090982
      2

R2	MSE	MAE	MSE_true_scale	RMSE_true_scale	MAE_true_scale	MedAE_true_scale	Dataset
-0.001312	0.037818	0.158211	258.019144	16.062974	13.068101	12.093447	0
0.594574	0.015312	0.101047	104.470664	10.221089	8.346391	7.861097	1
-0.001457	0.037824	0.158224	258.056630	16.064141	13.069158	12.090982	2

Changing to use a single batch file leads to surprisingly small changes. It seems like that SGD optimizer with a learning rate of 1 and momentum of 0.5 causes the trouble. Therefore, we should try some more modern approaches with a similar setup because of this really bad results and poor documentation on what features have been used with what model architecture:

# simple model similar to Yeh (1998)
# optimizer changed

def build_baseline_model_modified_optimizer(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='sigmoid'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model


  
    
      R2
      MSE
      MAE
      MSE_true_scale
      RMSE_true_scale
      MAE_true_scale
      MedAE_true_scale
      dataset
    
  
  
    
      0.848254
      0.005731
      0.059346
      39.102025
      6.253161
      4.901952
      4.055229
      0
    
    
      0.820670
      0.006773
      0.066122
      46.209889
      6.797786
      5.461594
      4.458681
      0
    
    
      0.841751
      0.005977
      0.059619
      40.777837
      6.385753
      4.924500
      3.821284
      1

R2	MSE	MAE	MSE_true_scale	RMSE_true_scale	MAE_true_scale	MedAE_true_scale	dataset
0.848254	0.005731	0.059346	39.102025	6.253161	4.901952	4.055229	0
0.820670	0.006773	0.066122	46.209889	6.797786	5.461594	4.458681	0
0.841751	0.005977	0.059619	40.777837	6.385753	4.924500	3.821284	1

# simple model similar to Yeh (1998)
# optimizer and activation function changed

def build_baseline_model_modified_optimizer_activation_function(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model


  
    
      R2
      MSE
      MAE
      MSE_true_scale
      RMSE_true_scale
      MAE_true_scale
      MedAE_true_scale
      dataset
    
  
  
    
      0.816829
      0.006918
      0.065828
      47.199752
      6.870208
      5.437363
      4.505559
      1
    
    
      0.826317
      0.006560
      0.062368
      44.754891
      6.689910
      5.151518
      3.635845
      2
    
    
      0.792628
      0.007832
      0.070945
      53.435914
      7.309987
      5.860026
      4.813152
      2

R2	MSE	MAE	MSE_true_scale	RMSE_true_scale	MAE_true_scale	MedAE_true_scale	dataset
0.816829	0.006918	0.065828	47.199752	6.870208	5.437363	4.505559	1
0.826317	0.006560	0.062368	44.754891	6.689910	5.151518	3.635845	2
0.792628	0.007832	0.070945	53.435914	7.309987	5.860026	4.813152	2

If we use adam as optimizer, then we achieve much better results than with stochastic gradient decent (with settings close to the original paper). And we have our first surprise: ReLU performs worse than Sigmoid as an activation function in such a shallow neural network. Nevertheless, we still perform worse than

More DNN experimentation

Let us try some more models. All four models use a batch_size of 128 and 1500 iterations.

def build_model_1(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='sigmoid'))
    model.add(Dense(20, activation='relu'))
    model.add(Dense(20, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model


  
    
      R2
      MSE
      MAE
      MSE_true_scale
      RMSE_true_scale
      MAE_true_scale
      MedAE_true_scale
      dataset
    
  
  
    
      0.849771
      0.005674
      0.057560
      38.711114
      6.221826
      4.754404
      3.278198
      0
    
    
      0.827993
      0.006496
      0.063274
      44.323052
      6.657556
      5.226355
      4.137564
      1
    
    
      0.814577
      0.007003
      0.064154
      47.779991
      6.912307
      5.299110
      4.138828
      2

R2	MSE	MAE	MSE_true_scale	RMSE_true_scale	MAE_true_scale	MedAE_true_scale	dataset
0.849771	0.005674	0.057560	38.711114	6.221826	4.754404	3.278198	0
0.827993	0.006496	0.063274	44.323052	6.657556	5.226355	4.137564	1
0.814577	0.007003	0.064154	47.779991	6.912307	5.299110	4.138828	2

def build_model_2(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='sigmoid'))
    model.add(BatchNormalization())
    model.add(Dense(20, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(20, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model


  
    
      R2
      MSE
      MAE
      MSE_true_scale
      RMSE_true_scale
      MAE_true_scale
      MedAE_true_scale
      dataset
    
  
  
    
      0.858410
      0.005348
      0.052534
      36.485129
      6.040292
      4.339288
      3.415696
      0
    
    
      0.813830
      0.007031
      0.063669
      47.972606
      6.926226
      5.259008
      4.033573
      1
    
    
      0.843381
      0.005915
      0.057839
      40.357724
      6.352773
      4.777441
      3.771590
      2

R2	MSE	MAE	MSE_true_scale	RMSE_true_scale	MAE_true_scale	MedAE_true_scale	dataset
0.858410	0.005348	0.052534	36.485129	6.040292	4.339288	3.415696	0
0.813830	0.007031	0.063669	47.972606	6.926226	5.259008	4.033573	1
0.843381	0.005915	0.057839	40.357724	6.352773	4.777441	3.771590	2

def build_model_3(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(30, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(50, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(20, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model


  
    
      R2
      MSE
      MAE
      MSE_true_scale
      RMSE_true_scale
      MAE_true_scale
      MedAE_true_scale
      dataset
    
  
  
    
      0.791940
      0.007858
      0.061559
      53.613003
      7.322090
      5.084692
      3.382265
      0
    
    
      0.813332
      0.007050
      0.056767
      48.100938
      6.935484
      4.688899
      3.234740
      1
    
    
      0.825277
      0.006599
      0.061221
      45.022793
      6.709903
      5.056845
      3.742250
      2

R2	MSE	MAE	MSE_true_scale	RMSE_true_scale	MAE_true_scale	MedAE_true_scale	dataset
0.791940	0.007858	0.061559	53.613003	7.322090	5.084692	3.382265	0
0.813332	0.007050	0.056767	48.100938	6.935484	4.688899	3.234740	1
0.825277	0.006599	0.061221	45.022793	6.709903	5.056845	3.742250	2

def build_model_4(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='elu'))
    model.add(BatchNormalization())
    model.add(Dense(20, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(40, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(30, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(20, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model


  
    
      R2
      MSE
      MAE
      MSE_true_scale
      RMSE_true_scale
      MAE_true_scale
      MedAE_true_scale
      dataset
    
  
  
    
      0.830969
      0.006384
      0.053768
      43.556202
      6.599712
      4.441234
      2.965220
      0
    
    
      0.864979
      0.005100
      0.049786
      34.792282
      5.898498
      4.112321
      2.718826
      1
    
    
      0.849393
      0.005688
      0.057379
      38.808450
      6.229643
      4.739484
      3.987607
      2

R2	MSE	MAE	MSE_true_scale	RMSE_true_scale	MAE_true_scale	MedAE_true_scale	dataset
0.830969	0.006384	0.053768	43.556202	6.599712	4.441234	2.965220	0
0.864979	0.005100	0.049786	34.792282	5.898498	4.112321	2.718826	1
0.849393	0.005688	0.057379	38.808450	6.229643	4.739484	3.987607	2

On spending more time on building DNNs

Well, we missed the baseline performance from the publication as well as the performances of XGBoost. Would it make sense to spend a few more hours to train and test various neural network architectures?
We are dealing with natural materials, a very small dataset and more importantly an almost undocumented dataset. Would it change anything in application?

Discussion of results

Let us assess the final model results. The best performance is reached with XGBoost on dataset 1 (pure water-cement ratio):

Besides the averaged error metrics, we can have a look at the two error types directly:

The absolute errors are more important for us to understand the consequences of our model. If we take the median absolute error of ~ 2 MPa, the unknowns in the dataset and the nature of mixing concrete and the construction process into consideration, then we can say that this is sufficient enough.

Applying the best model on the complete data set

For the bean counters in data science and machine learning: no model is trained on this data. It is a final evaluation in terms of getting a deeper physical understanding of the model.

Another helpful toll with such small datasets that involve some variability is to have a look a the whole dataset. It simply helps to understand if there are some extreme outliers in the training data or not.

What is still unclear (and some of this is mentioned in the original publication as well):

What exactly is defined as coarse and fine aggregates. Yes there are boundaries: all aggregates that pass a 4.25 mm sieve are considered as fine. However, to derive any meaningful mechanical properties from aggregates we should know the d/D value which describes the mesh diameter of the fine (d) and coarse (D) sieve.
What kind of aggregates were used?
What was the ambient temperature during hardening (btw there are different standard temperatures for room temperature)? Was the temperature constant considering all as well as single samples.
What type of cement is used? We have to remember that the concept of water-cement ratios is 100 years old.
What kind of superplasticizer was used during lab testing? Was it the same from the charge all the time? They act differently depending on their chemistry.
What kind of fly ash was used? This is important because the dataset is about high performance concrete (HPC). The selection of fly ash can depend on requirements.
What is the porosity of the concrete?

Therefore, it is almost impossible to extract more useful information from this dataset.

Nevertheless, I enjoyed exploring it!

References

[1] Yeh, I.-C. (1998): Modeling of strength of high performance concrete using artificial
neural networks. In: Cement and Concrete Research 28 (12), 1797-1808. doi:10.1016/S0008-8846(98)00165-3

[2] Dua, D.; Taniskidou, K.E. (2018). UCI Machine Learning Repository http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science.

Acknowledgements

I would like to thank I-Cheng Yeh for making the dataset available.