Today, we will have a look at this dataset on Concrete Compressive Strength by Yeh (1998) [1] as part of my “Exploring Less Known Datasets for Machine Learning” series. Let us see how state-of-the-art algorithms compare with the results from 1998. (My views on this dataset are entirely based on Yeh (1998)).


Contents


Dataset, baseline results from Yeh (1998) and some domain knowledge


Legal

dataset © Prof. I-Cheng Yeh [1]; published in UCI Machine Learning Repository [2].



The aim of the dataset is to predict concrete compressive strength of high performance concrete (HPC). HPC does not always means high strength but covers all kinds of concrete for special applications that are not possible with standard concretes. Therefore, our target value is:

target y

  • Concrete compressive strength [MPa]

In this case the compressive strength is the cylindrical compressive strength meaning a cylindrical sample (15 cm diameter; 30 cm height) was used for testing. The value is a bit smaller than testing on cubic samples. Both tests assess the uniaxial compressive strength. Usually, we get both values if we buy concrete.

To predict compressive strengths, we have these features available:

input X

  • Cement \(\left[\frac{kg}{m^3}\right]\)
  • Fly ash \(\left[\frac{kg}{m^3}\right]\)
  • Blast furnance slag \(\left[\frac{kg}{m^3}\right]\)
  • Water \(\left[\frac{kg}{m^3}\right]\)
  • Superplasticizer \(\left[\frac{kg}{m^3}\right]\)
  • Fine aggregate \(\left[\frac{kg}{m^3}\right]\)
  • Coarse aggregate \(\left[\frac{kg}{m^3}\right]\)
  • Age \([d]\)

Let us have a closer look at some of the features:

Age

Concrete hardens with time and strength increases. Usually, concrete is tested after 28 days.

Fly ash and slag

Both features can be considered as binder together with cement. Both increase strength and durability of concrete. However, the hardening process takes longer and therefore it requires more time to reach full compressive strength.

Superplasticizer

Superplasticizer are used to ensure better flow properties because they minimize particle segregation. Further, they allow to decrease the water-cement ratio which leads to higher compressive strength.


Since it is a small dataset, Yeh (1998) selected training and testing sets manually as well as randomly. In the manual splitting the dataset was splitted into 4 sets and were used in a “k-fold style”.

The handpicked sets show signs of overfitting:

  • R2training: 0.917 - 0.935
  • R2testing: 0.814 - 0.895

The random sets show less signs signs of overfitting and better overall performance:

  • R2training: 0.923 - 0.945
  • R2testing: 0.908 - 0.922.

Dataset exploration and preprocessing

Basic dataset properties

Let us load the dataset (xls file) using pandas and have a look at the data:

I cleaned the header a bit to make more accessible.

filepath_input_data = './data/Concrete_Data_clean_header.xls'
input_data = pd.read_excel(filepath_input_data)
display(input_data.head(5))
Cement Blast Furnace Slag Fly Ash Water Superplasticizer Coarse Aggregate Fine Aggregate Age Concrete compressive strength
0 540.0 0.0 0.0 162.0 2.5 1040.0 676.0 28 79.986111
1 540.0 0.0 0.0 162.0 2.5 1055.0 676.0 28 61.887366
2 332.5 142.5 0.0 228.0 0.0 932.0 594.0 270 40.269535
3 332.5 142.5 0.0 228.0 0.0 932.0 594.0 365 41.052780
4 198.6 132.4 0.0 192.0 0.0 978.4 825.5 360 44.296075


display(input_data.tail(5))
Cement Blast Furnace Slag Fly Ash Water Superplasticizer Coarse Aggregate Fine Aggregate Age Concrete compressive strength
1025 276.4 116.0 90.3 179.6 8.9 870.1 768.3 28 44.284354
1026 322.2 0.0 115.6 196.0 10.4 817.9 813.4 28 31.178794
1027 148.5 139.4 108.6 192.7 6.1 892.4 780.0 28 23.696601
1028 159.1 186.7 0.0 175.6 11.3 989.6 788.9 28 32.768036
1029 260.9 100.5 78.3 200.6 8.6 864.5 761.5 28 32.401235


input_data.describe()
Cement Blast Furnace Slag Fly Ash Water Superplasticizer Coarse Aggregate Fine Aggregate Age Concrete compressive strength
count 1030.000000 1030.000000 1030.000000 1030.000000 1030.000000 1030.000000 1030.000000 1030.000000 1030.000000
mean 281.165631 73.895485 54.187136 181.566359 6.203112 972.918592 773.578883 45.662136 35.817836
std 104.507142 86.279104 63.996469 21.355567 5.973492 77.753818 80.175427 63.169912 16.705679
min 102.000000 0.000000 0.000000 121.750000 0.000000 801.000000 594.000000 1.000000 2.331808
25% 192.375000 0.000000 0.000000 164.900000 0.000000 932.000000 730.950000 7.000000 23.707115
50% 272.900000 22.000000 0.000000 185.000000 6.350000 968.000000 779.510000 28.000000 34.442774
75% 350.000000 142.950000 118.270000 192.000000 10.160000 1029.400000 824.000000 56.000000 46.136287
max 540.000000 359.400000 200.100000 247.000000 32.200000 1145.000000 992.600000 365.000000 82.599225


A scatterplot helps us to understand discrete distributions of each feature and provides us with the correlation coefficent (Pearson).

scatterplot_matrix  = pd.plotting.scatter_matrix(input_data, alpha=0.2, figsize=(17, 17), diagonal='kde')
corr = input_data.corr().as_matrix()
for i,j in zip (*plt.np.triu_indices_from(scatterplot_matrix, k=1)):
    scatterplot_matrix[i,j].annotate('%.4f' %corr[i,j], (0.5,0.5), xycoords='axes fraction', ha='center', va='center')
plt.tight_layout()
plt.show()

The scatterplot matrix shows us that there are some features such as ‘Age’ are less evenly distributed than other features. Using boxplots tell us a bit more about our data:


Since most of our machine learning algorithms assume that we have normalized data, we can apply MaxAbsScaler to scale them.

Feature engineering

A key factor in concrete engineering is the water-cement ratio.

Yeh (1998) used

  • Water to cement
  • Water to binder
  • Superplasticizer to binder
  • Fly ash to binder
  • Slag to binder
  • Fly ash + slag to binder

This raises a few questions. Mainly: what exactly is binder?

With this dataset we cannot reproduce any values of the ratios that Yeh (1998) used. It is also not clear what features he finally used. Did he used only a part of them? Only ratio and age? This is only documented properly for (manually designed) linear regerssion models but not for the neural networks.

Water to cement ratio

Water to cement seems to be the simplest:

\[\text{w/c} = \frac{\text{Water} \left[\frac{kg}{m^3}\right]}{\text{Cement} \left[\frac{kg}{m^3}\right]}\]
# pure water-cement ratio feature
input_features_unscaled_with_wcratio = input_data.copy()
input_features_unscaled_with_wcratio.insert(input_features_unscaled_with_wcratio.shape[-1]-1,'water-cement-ratio', input_features_unscaled_with_wcratio['Water ']/input_features_unscaled_with_wcratio['Cement'])
input_features_unscaled_with_wcratio.drop(['Water ', 'Cement'], axis=1, inplace=True)

# scale it!
scaler = MaxAbsScaler()
input_features_unscaled_with_wcratio_scaled = input_features_unscaled_with_wcratio.copy()
input_features_unscaled_with_wcratio_sc = scaler.fit_transform(input_features_unscaled_with_wcratio)
input_features_unscaled_with_wcratio_scaled.loc[:,:] = input_features_unscaled_with_wcratio_sc

# making the scaling function more accessible
extract_scaling_function = np.ones((1,input_features_unscaled_with_wcratio.shape[1]))
extract_scaling_function = scaler.inverse_transform(extract_scaling_function)

# split data into X and y
y_df = input_features_unscaled_with_wcratio_scaled['Concrete compressive strength'].copy()
X_df = input_features_unscaled_with_wcratio_scaled.copy()
X_df.drop('Concrete compressive strength', axis=1, inplace=True)

# add it to the dict of datasets for fast and iterative testing
X_train, X_test, y_train, y_test = train_test_split(X_df.values, y_df.values,test_size=0.2, random_state=42, shuffle=True)
comment = 'pure water-cement-ratio; 7 inputs, 1 output'
datasets[dataset_id] = {'X_train': X_train, 'X_test' : X_test, 'y_train': y_train, 'y_test' : y_test, 'scaler' : scaler, 'scaler_array' : extract_scaling_function, 'comment' : comment, 'dataset' : dataset_id}
dataset_id += 1

input_features_unscaled_with_wcratio['water-cement-ratio'].head()
0    0.300000
1    0.300000
2    0.685714
3    0.685714
4    0.966767
Name: water-cement-ratio, dtype: float64
input_features_unscaled_with_wcratio['water-cement-ratio'].describe()
count    1030.000000
mean        0.748269
std         0.314005
min         0.266893
25%         0.533333
50%         0.675349
75%         0.935014
max         1.882353
Name: water-cement-ratio, dtype: float64

Roughly 300 datapoints were added compared to the original publication. There the water-cement ratio was described as:

min  0.24
mean 0.97
max  2.73

This raises a question: what has happened? Different min and mean are explainable but not a lower max value.


Water to binder ratio

Depending on the definition we could count fly ash and slag as binder as well. Usually they are weighted with k-values which we do not have. Hence, we have to try without them:

\[\text{w/b} = \frac{\text{Water} \left[\frac{kg}{m^3}\right]}{\text{Cement} \left[\frac{kg}{m^3}\right] + \text{Fly ash} \left[\frac{kg}{m^3}\right] + \text{Blast furnance slag} \left[\frac{kg}{m^3}\right]}\]
input_features_unscaled_with_wcratio_full = input_data.copy()
input_features_unscaled_with_wcratio_full.insert(input_features_unscaled_with_wcratio_full.shape[-1]-1,'water-cement-ratio_with_slag_ash',input_features_unscaled_with_wcratio_full['Water ']/(input_features_unscaled_with_wcratio_full['Cement'] + input_features_unscaled_with_wcratio_full['Fly Ash'] + input_features_unscaled_with_wcratio_full['Blast Furnace Slag ']))
input_features_unscaled_with_wcratio_full.drop(['Water ', 'Cement', 'Fly Ash', 'Blast Furnace Slag '], axis=1, inplace=True)

scaler = MaxAbsScaler()
input_features_unscaled_with_wcratio_full_scaled = input_features_unscaled_with_wcratio_full.copy()
input_features_unscaled_with_wcratio_full_sc = scaler.fit_transform(input_features_unscaled_with_wcratio_full)
input_features_unscaled_with_wcratio_full_scaled.loc[:,:] = input_features_unscaled_with_wcratio_full_sc
extract_scaling_function = np.ones((1,input_features_unscaled_with_wcratio_full.shape[1]))
extract_scaling_function = scaler.inverse_transform(extract_scaling_function)

# split data into X and y
y_df = input_features_unscaled_with_wcratio_full_scaled['Concrete compressive strength'].copy()
X_df = input_features_unscaled_with_wcratio_full_scaled.copy()
X_df.drop('Concrete compressive strength', axis=1, inplace=True)


X_train, X_test, y_train, y_test = train_test_split(X_df.values, y_df.values,test_size=0.2, random_state=42, shuffle=True)
comment = 'water-cement-ratio (binder?); 5 inputs, 1 output'
datasets[dataset_id] = {'X_train': X_train, 'X_test' : X_test, 'y_train': y_train, 'y_test' : y_test, 'scaler' : scaler, 'scaler_array' : extract_scaling_function, 'comment' : comment, 'dataset' : dataset_id}

input_features_unscaled_with_wcratio_full['water-cement-ratio_with_slag_ash'].head()
0    0.30000
1    0.30000
2    0.48000
3    0.48000
4    0.58006
Name: water-cement-ratio_with_slag_ash, dtype: float64
input_features_unscaled_with_wcratio_full['water-cement-ratio_with_slag_ash'].describe()
count    1030.000000
mean        0.469236
std         0.127127
min         0.235073
25%         0.383916
50%         0.472118
75%         0.561244
max         0.900000
Name: water-cement-ratio_with_slag_ash, dtype: float64

Applying classical machine learning algorithms

When it comes to applying classical machine learning algorithms to relatively small datasets (low amount of features, not too many data points), then we may try to apply a brute-force approach to find suitable models. In this case we can use GridSearchCV from scikit-learn to train and test various hyperparameters for the following algorithms (4-fold cross-validation to keep validation and test set sizes similar: both 0.2 * dataset size):

  • Linear Regression
  • Decision Tree Regression
  • SVM Regression
  • Random Forest Regression
  • AdaBoost Regression
  • XGBoost Regression

In this case using the mean-squared error as a scorer is appropriate. However, for later interpretation we have to re-calculate our metrics on the unscaled data as well since it resembles a physical value. Therefore, our regression looks like this:

def xgboost_regression(X_train, X_test, y_train, y_test,scorer,dataset_id):
    x_gradient_boosting_regression = xgboost.XGBRegressor(random_state=42)
    grid_parameters_x_gradient_boosting_regression = {'n_estimators' : [3,5,10,15,18,20,25,50,60,80,100,120,150,200,300],
                                                    'max_depth' : [1,2, 3,5,7,9,10,11,15],
                                                    'learning_rate' :[ 0.0001, 0.001, 0.01, 0.1, 0.15, 0.2, 0.8, 1.0],
                                                     }
    start_time = time.time()
    grid_obj = GridSearchCV(x_gradient_boosting_regression, param_grid=grid_parameters_x_gradient_boosting_regression, cv=kfold_vs_size, n_jobs=-1, scoring=scorer, verbose=0)
    grid_fit = grid_obj.fit(X_train, y_train)
    training_time = time.time() - start_time
    best_x_gradient_boosting_regression = grid_fit.best_estimator_
    prediction = best_x_gradient_boosting_regression.predict(X_test)
    r2 = r2_score(y_test, prediction)
    mse = mean_squared_error(y_test, prediction)
    mae = mean_absolute_error(y_true=y_test, y_pred=prediction)
    
    # metrics for true values
    # r2 remains unchanged, mse, mea will change and cannot be scaled
    # because there is some physical meaning behind it
    prediction_true_scale = prediction * datasets[dataset_id]['scaler_array'][:,-1]
    y_test_true_scale = y_test * datasets[dataset_id]['scaler_array'][:,-1]
    mae_true_scale = mean_absolute_error(y_true=y_test_true_scale, y_pred=prediction_true_scale)
    medae_true_scale = median_absolute_error(y_true=y_test_true_scale, y_pred=prediction_true_scale)
    mse_true_scale = mean_squared_error(y_true=y_test_true_scale, y_pred=prediction_true_scale)
    
    return {'Regression type' : 'eXtreme Gradient Boosting Regression', 'model' : grid_fit, 'Predictions' : prediction, 'R2' : r2, 'MSE' : mse, 'MAE' : mae, 'MSE_true_scale' : mse_true_scale, 'RMSE_true_scale' : np.sqrt(mse_true_scale), 'MAE_true_scale' : mae_true_scale, 'MedAE_true_scale' : medae_true_scale ,'Training time' : training_time, 'dataset' : dataset_id}

This leads to the following results:

Dataset 0 (no features added or dropped)

Regression type R2 MSE MAE MSE_true_scale RMSE_true_scale MAE_true_scale MedAE_true_scale
Linear Regression 0.628008 0.014050 0.093831 95.855234 9.790569 7.750363 6.208384
Decision Tree Regression 0.809726 0.007186 0.056071 49.029968 7.002140 4.631395 3.137805
SVM Regression 0.796093 0.007701 0.068799 52.542968 7.248653 5.682717 4.554011
Random Forest Regression 0.875810 0.004690 0.048152 32.001287 5.656968 3.977335 2.662481
AdaBoost Regression 0.775230 0.008489 0.076112 57.918903 7.610447 6.286776 6.055077
eXtreme Gradient Boosting Regression 0.921480 0.002966 0.037675 20.233235 4.498137 3.111918 1.982341


Dataset 1 (water-cement ratio added, ‘Water’ and ‘Cement’ dropped)

Regression type R2 MSE MAE MSE_true_scale RMSE_true_scale MAE_true_scale MedAE_true_scale
Linear Regression 0.583423 0.015734 0.098371 107.344074 10.360699 8.125363 6.605434
Decision Tree Regression 0.884075 0.004378 0.046782 29.871788 5.465509 3.864138 2.758249
SVM Regression 0.792823 0.007825 0.068682 53.385716 7.306553 5.673112 4.536345
Random Forest Regression 0.896781 0.003898 0.044800 26.597586 5.157285 3.700447 2.627287
AdaBoost Regression 0.761645 0.009002 0.076656 61.419547 7.837062 6.331761 5.468500
eXtreme Gradient Boosting Regression 0.942883 0.002157 0.033113 14.718070 3.836414 2.735069 1.980554



Dataset 1 (water-binder ratio added, ‘Water’,’Cement’,’Slag’ and ‘Fly ash’ dropped)

Regression type R2 MSE MAE MSE_true_scale RMSE_true_scale MAE_true_scale MedAE_true_scale
Linear Regression 0.616510 0.014484 0.098235 98.818192 9.940734 8.114129 7.272473
Decision Tree Regression 0.809843 0.007182 0.056120 48.999786 6.999985 4.635475 2.871331
SVM Regression 0.780538 0.008289 0.069751 56.551185 7.520052 5.761395 4.419597
Random Forest Regression 0.888769 0.004201 0.044238 28.662208 5.353710 3.653984 2.216723
AdaBoost Regression 0.759691 0.009076 0.076208 61.923092 7.869123 6.294741 5.834458
eXtreme Gradient Boosting Regression 0.915372 0.003196 0.038303 21.807015 4.669798 3.163834 2.303538


First, using XGBoost on dataset 1 outperforms all neural networks presented in the original paper by Yeh (1998). Moreover, XGBoost reaches performance similar to the neural networks on all three datasets. We should not be surprise that dataset 2 does not perform well. As mentioned earlier the water-binder ratio is a wild guess on what they might have used.

Further, Random Forest regression performs in the range of the neural network if the neural network was used on handpicked training and testing sets.

Applying Deep Neural Networks

Do we really need neural networks? From a time investment perspective this questionable however not from a perspective of having fun ;). We can use Keras to build our DNNs since this is a small and simple dataset and therefore we do not need to optimize much for speed.

Neural Networks similar to Yeh (1998)

Yeh writes in his 1998 paper that he uses 1 layer with 8 units (for 8 input features?) and one output layer. Further, he used a learning rate of 1.0 and a “momentum factor” of 0.5 and 3000 iterations. It is unclear if any cross-validation was used. Unfortunately, that is all that was published on the neural network. I guess this is as close as we can get to it:

# simple model close to Yeh (1998)

def build_baseline_model(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='sigmoid'))
    model.add(Dense(1))
    sgd = optimizers.SGD(lr=1.0, momentum=0.5)
    model.compile(loss='mean_squared_error', optimizer=sgd, metrics=['mae'])
    return model

It is difficult to assume a useful bath size. Perhaps unlimited is appropriate since the original work was carried out in 1998. Let us start with batch_size = 16:

# example of how the training process looks like for all neural networks
def run_baseline_model(X_train, X_test, y_train, y_test, dataset_id, epochs=3000,validation_split=0.2, batch_size=16):
    model = build_baseline_model(datasets[dataset_id]['X_train'].shape[1])
    
    checkpoint = callbacks.ModelCheckpoint('keras_models/model_baseline_model_'+str(dataset)+'_best.hdf5', monitor='val_loss', save_best_only=True, save_weights_only=True)
    start_time = time.time()
    model.fit(X_train, y_train,callbacks=[checkpoint], batch_size=batch_size, epochs=epochs, validation_split=validation_split)
    training_time = time.time() - start_time
    
    # load best model
    model.load_weights('keras_models/model_baseline_model_'+str(dataset)+'_best.hdf5') 
    prediction = model.predict(X_test)
    r2 = r2_score(y_test, prediction)
    mse = mean_squared_error(y_test, prediction)
    mae = mean_absolute_error(y_true=y_test, y_pred=prediction)
    
    # metrics for true values
    # r2 remains unchanged, mse, mea will change and cannot be scaled
    # because there is some physical meaning behind it
    prediction_true_scale = prediction * datasets[dataset_id]['scaler_array'][:,-1]
    y_test_true_scale = y_test * datasets[dataset_id]['scaler_array'][:,-1]
    mae_true_scale = mean_absolute_error(y_true=y_test_true_scale, y_pred=prediction_true_scale)
    medae_true_scale = median_absolute_error(y_true=y_test_true_scale, y_pred=prediction_true_scale)
    mse_true_scale = mean_squared_error(y_true=y_test_true_scale, y_pred=prediction_true_scale)
    
    
    return {'Regression type' : 'Baseline NN', 'model' : model, 'Predictions' : prediction, 'R2' : r2, 'MSE' : mse, 'MAE' : mae, 'MSE_true_scale' : mse_true_scale, 'RMSE_true_scale' : np.sqrt(mse_true_scale), 'MAE_true_scale' : mae_true_scale, 'MedAE_true_scale' : medae_true_scale ,'Training time' : training_time, 'dataset' : dataset_id}

This model yields total chaos:

R2 MSE MAE MSE_true_scale RMSE_true_scale MAE_true_scale MedAE_true_scale Dataset
-0.001312 0.037818 0.158211 258.019144 16.062974 13.068101 12.093447 0
0.594574 0.015312 0.101047 104.470664 10.221089 8.346391 7.861097 1
-0.001457 0.037824 0.158224 258.056630 16.064141 13.069158 12.090982 2


Changing to use a single batch file leads to surprisingly small changes. It seems like that SGD optimizer with a learning rate of 1 and momentum of 0.5 causes the trouble. Therefore, we should try some more modern approaches with a similar setup because of this really bad results and poor documentation on what features have been used with what model architecture:

# simple model similar to Yeh (1998)
# optimizer changed

def build_baseline_model_modified_optimizer(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='sigmoid'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model

R2 MSE MAE MSE_true_scale RMSE_true_scale MAE_true_scale MedAE_true_scale dataset
0.848254 0.005731 0.059346 39.102025 6.253161 4.901952 4.055229 0
0.820670 0.006773 0.066122 46.209889 6.797786 5.461594 4.458681 0
0.841751 0.005977 0.059619 40.777837 6.385753 4.924500 3.821284 1


# simple model similar to Yeh (1998)
# optimizer and activation function changed

def build_baseline_model_modified_optimizer_activation_function(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model
R2 MSE MAE MSE_true_scale RMSE_true_scale MAE_true_scale MedAE_true_scale dataset
0.816829 0.006918 0.065828 47.199752 6.870208 5.437363 4.505559 1
0.826317 0.006560 0.062368 44.754891 6.689910 5.151518 3.635845 2
0.792628 0.007832 0.070945 53.435914 7.309987 5.860026 4.813152 2


If we use adam as optimizer, then we achieve much better results than with stochastic gradient decent (with settings close to the original paper). And we have our first surprise: ReLU performs worse than Sigmoid as an activation function in such a shallow neural network. Nevertheless, we still perform worse than

More DNN experimentation

Let us try some more models. All four models use a batch_size of 128 and 1500 iterations.

def build_model_1(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='sigmoid'))
    model.add(Dense(20, activation='relu'))
    model.add(Dense(20, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model
R2 MSE MAE MSE_true_scale RMSE_true_scale MAE_true_scale MedAE_true_scale dataset
0.849771 0.005674 0.057560 38.711114 6.221826 4.754404 3.278198 0
0.827993 0.006496 0.063274 44.323052 6.657556 5.226355 4.137564 1
0.814577 0.007003 0.064154 47.779991 6.912307 5.299110 4.138828 2


def build_model_2(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='sigmoid'))
    model.add(BatchNormalization())
    model.add(Dense(20, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(20, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model
R2 MSE MAE MSE_true_scale RMSE_true_scale MAE_true_scale MedAE_true_scale dataset
0.858410 0.005348 0.052534 36.485129 6.040292 4.339288 3.415696 0
0.813830 0.007031 0.063669 47.972606 6.926226 5.259008 4.033573 1
0.843381 0.005915 0.057839 40.357724 6.352773 4.777441 3.771590 2


def build_model_3(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(30, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(50, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(20, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model
R2 MSE MAE MSE_true_scale RMSE_true_scale MAE_true_scale MedAE_true_scale dataset
0.791940 0.007858 0.061559 53.613003 7.322090 5.084692 3.382265 0
0.813332 0.007050 0.056767 48.100938 6.935484 4.688899 3.234740 1
0.825277 0.006599 0.061221 45.022793 6.709903 5.056845 3.742250 2


def build_model_4(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='elu'))
    model.add(BatchNormalization())
    model.add(Dense(20, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(40, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(30, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(20, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model

R2 MSE MAE MSE_true_scale RMSE_true_scale MAE_true_scale MedAE_true_scale dataset
0.830969 0.006384 0.053768 43.556202 6.599712 4.441234 2.965220 0
0.864979 0.005100 0.049786 34.792282 5.898498 4.112321 2.718826 1
0.849393 0.005688 0.057379 38.808450 6.229643 4.739484 3.987607 2


On spending more time on building DNNs

Well, we missed the baseline performance from the publication as well as the performances of XGBoost. Would it make sense to spend a few more hours to train and test various neural network architectures?
We are dealing with natural materials, a very small dataset and more importantly an almost undocumented dataset. Would it change anything in application?

Discussion of results

Let us assess the final model results. The best performance is reached with XGBoost on dataset 1 (pure water-cement ratio):




Besides the averaged error metrics, we can have a look at the two error types directly:



The absolute errors are more important for us to understand the consequences of our model. If we take the median absolute error of ~ 2 MPa, the unknowns in the dataset and the nature of mixing concrete and the construction process into consideration, then we can say that this is sufficient enough.

Applying the best model on the complete data set

For the bean counters in data science and machine learning: no model is trained on this data. It is a final evaluation in terms of getting a deeper physical understanding of the model.

Another helpful toll with such small datasets that involve some variability is to have a look a the whole dataset. It simply helps to understand if there are some extreme outliers in the training data or not.


What is still unclear (and some of this is mentioned in the original publication as well):

  • What exactly is defined as coarse and fine aggregates. Yes there are boundaries: all aggregates that pass a 4.25 mm sieve are considered as fine. However, to derive any meaningful mechanical properties from aggregates we should know the d/D value which describes the mesh diameter of the fine (d) and coarse (D) sieve.
  • What kind of aggregates were used?
  • What was the ambient temperature during hardening (btw there are different standard temperatures for room temperature)? Was the temperature constant considering all as well as single samples.
  • What type of cement is used? We have to remember that the concept of water-cement ratios is 100 years old.
  • What kind of superplasticizer was used during lab testing? Was it the same from the charge all the time? They act differently depending on their chemistry.
  • What kind of fly ash was used? This is important because the dataset is about high performance concrete (HPC). The selection of fly ash can depend on requirements.
  • What is the porosity of the concrete?

Therefore, it is almost impossible to extract more useful information from this dataset.

Nevertheless, I enjoyed exploring it!

References

[1] Yeh, I.-C. (1998): Modeling of strength of high performance concrete using artificial
neural networks. In: Cement and Concrete Research 28 (12), 1797-1808. doi:10.1016/S0008-8846(98)00165-3

[2] Dua, D.; Taniskidou, K.E. (2018). UCI Machine Learning Repository http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science.

Acknowledgements

I would like to thank I-Cheng Yeh for making the dataset available.