Condition Based Maintenance of Naval Propulsion Plants Dataset by Altosole et al. (2009) and Coraddu et al. (2014) [1, 2].
We’ll find it on the UCI Machine Learning Repository. The aim of applying machine learning to this dataset is to predict decay states of turbines and generators.

Let’s have a look at the dataset:

import time
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_log_error, mean_squared_error, mean_absolute_error
import sklearn.metrics
import pandas as pd
import numpy as np
import seaborn as sns
import pickle
import matplotlib.pyplot as plt
import matplotlib.transforms as mtransforms


import warnings
warnings.simplefilter('ignore')



from sklearn.preprocessing import MaxAbsScaler
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.linear_model import LinearRegression, BayesianRidge
from sklearn.svm import LinearSVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from skgarden import MondrianForestRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import r2_score, mean_squared_error, make_scorer, mean_absolute_error, median_absolute_error
import xgboost

from sklearn.neighbors import KNeighborsRegressor

featureList = ["Lever position (lp)",
               "Ship speed (v) [knots]",
               "Gas Turbine shaft torque (GTT) [kN m]",
               "Gas Turbine rate of revolutions (GTn) [rpm]",
               "Gas Generator rate of revolutions (GGn) [rpm]",
               "Starboard Propeller Torque (Ts) [kN]",
               "Port Propeller Torque (Tp) [kN]",
               "HP Turbine exit temperature (T48) [C]",
               "GT Compressor inlet air temperature (T1) [C]",
               "GT Compressor outlet air temperature (T2) [C]",
               "HP Turbine exit pressure (P48) [bar]",
               "GT Compressor inlet air pressure (P1) [bar]",
               "GT Compressor outlet air pressure (P2) [bar]",
               "Gas Turbine exhaust gas pressure (Pexh) [bar]",
               "Turbine Injecton Control (TIC) [%]",
               "Fuel flow (mf) [kg/s]",
               "GT Compressor decay state coefficient",
               "GT Turbine decay state coefficient"]

input_data = np.loadtxt("./data/data.txt")
input_data = pd.DataFrame(input_data)
input_data.columns = featureList
display(input_data.sample(10))
display(input_data.describe())
Lever position (lp) Ship speed (v) [knots] Gas Turbine shaft torque (GTT) [kN m] Gas Turbine rate of revolutions (GTn) [rpm] Gas Generator rate of revolutions (GGn) [rpm] Starboard Propeller Torque (Ts) [kN] Port Propeller Torque (Tp) [kN] HP Turbine exit temperature (T48) [C] GT Compressor inlet air temperature (T1) [C] GT Compressor outlet air temperature (T2) [C] HP Turbine exit pressure (P48) [bar] GT Compressor inlet air pressure (P1) [bar] GT Compressor outlet air pressure (P2) [bar] Gas Turbine exhaust gas pressure (Pexh) [bar] Turbine Injecton Control (TIC) [%] Fuel flow (mf) [kg/s] GT Compressor decay state coefficient GT Turbine decay state coefficient
4164 7.148 21.0 38984.125 2678.083 9134.088 332.196 332.196 822.672 288.0 691.172 2.963 0.998 15.395 1.035 43.655 0.864 0.967 0.995
11275 8.206 24.0 50988.694 3087.250 9292.350 437.905 437.905 916.089 288.0 726.146 3.608 0.998 18.714 1.042 60.182 1.191 0.998 0.979
11787 7.148 21.0 39013.870 2677.969 9112.900 332.565 332.565 812.310 288.0 683.069 2.992 0.998 15.690 1.036 43.382 0.859 1.000 0.984
574 8.206 24.0 50997.437 3087.622 9321.328 438.131 438.131 939.372 288.0 739.327 3.568 0.998 18.404 1.041 61.297 1.214 0.952 0.986
8488 2.088 6.0 4944.888 1394.855 6778.351 31.602 31.602 552.041 288.0 565.941 1.278 0.998 6.913 1.020 0.000 0.188 0.986 0.982
2581 8.206 24.0 50994.648 3087.478 9311.778 437.974 437.974 944.707 288.0 738.216 3.584 0.998 18.669 1.041 62.011 1.228 0.961 0.975
11159 9.300 27.0 72768.661 3560.398 9739.194 644.792 644.792 1048.321 288.0 769.827 4.536 0.998 22.657 1.052 87.115 1.724 0.997 0.992
8846 9.300 27.0 72774.460 3560.396 9752.740 644.870 644.870 1056.432 288.0 772.829 4.520 0.998 22.512 1.051 87.564 1.733 0.987 0.995
2268 1.138 3.0 4769.838 1313.513 6685.315 5.591 5.591 604.319 288.0 567.452 1.245 0.998 6.717 1.019 32.167 0.268 0.959 0.993
7409 3.144 9.0 8377.115 1386.746 7084.867 60.329 60.329 578.171 288.0 578.343 1.390 0.998 7.464 1.020 11.991 0.237 0.981 0.992
Lever position (lp) Ship speed (v) [knots] Gas Turbine shaft torque (GTT) [kN m] Gas Turbine rate of revolutions (GTn) [rpm] Gas Generator rate of revolutions (GGn) [rpm] Starboard Propeller Torque (Ts) [kN] Port Propeller Torque (Tp) [kN] HP Turbine exit temperature (T48) [C] GT Compressor inlet air temperature (T1) [C] GT Compressor outlet air temperature (T2) [C] HP Turbine exit pressure (P48) [bar] GT Compressor inlet air pressure (P1) [bar] GT Compressor outlet air pressure (P2) [bar] Gas Turbine exhaust gas pressure (Pexh) [bar] Turbine Injecton Control (TIC) [%] Fuel flow (mf) [kg/s] GT Compressor decay state coefficient GT Turbine decay state coefficient
count 11934.000000 11934.000000 11934.000000 11934.000000 11934.000000 11934.000000 11934.000000 11934.000000 11934.0 11934.000000 11934.000000 1.193400e+04 11934.000000 11934.000000 11934.000000 11934.000000 11934.00000 11934.0000
mean 5.166667 15.000000 27247.498685 2136.289256 8200.947312 227.335768 227.335768 735.495446 288.0 646.215331 2.352963 9.980000e-01 12.297123 1.029474 33.641261 0.662440 0.97500 0.9875
std 2.626388 7.746291 22148.613155 774.083881 1091.315507 200.495889 200.495889 173.680552 0.0 72.675882 1.084770 2.533635e-13 5.337448 0.010390 25.841363 0.507132 0.01472 0.0075
min 1.138000 3.000000 253.547000 1307.675000 6589.002000 5.304000 5.304000 442.364000 288.0 540.442000 1.093000 9.980000e-01 5.828000 1.019000 0.000000 0.068000 0.95000 0.9750
25% 3.144000 9.000000 8375.883750 1386.758000 7058.324000 60.317000 60.317000 589.872750 288.0 578.092250 1.389000 9.980000e-01 7.447250 1.020000 13.677500 0.246000 0.96200 0.9810
50% 5.140000 15.000000 21630.659000 1924.326000 8482.081500 175.268000 175.268000 706.038000 288.0 637.141500 2.083000 9.980000e-01 11.092000 1.026000 25.276500 0.496000 0.97500 0.9875
75% 7.148000 21.000000 39001.426750 2678.079000 9132.606000 332.364750 332.364750 834.066250 288.0 693.924500 2.981000 9.980000e-01 15.658000 1.036000 44.552500 0.882000 0.98800 0.9940
max 9.300000 27.000000 72784.872000 3560.741000 9797.103000 645.249000 645.249000 1115.797000 288.0 789.094000 4.560000 9.980000e-01 23.140000 1.052000 92.556000 1.832000 1.00000 1.0000

Let’s have a look at the target variables:

It looks like most features are distributed normally:

Let’s split the dataset, run some algorithms on it and see what happens.

# split data into X and y
y1 = input_data['GT Compressor decay state coefficient'].copy(deep=True)
y2 = input_data['GT Turbine decay state coefficient'].copy(deep=True)
X = input_data.copy(deep=True)
X.drop(['GT Compressor decay state coefficient','GT Turbine decay state coefficient'], inplace=True, axis=1)
scaler = MaxAbsScaler()
X.loc[:,:] = scaler.fit_transform(X)

datasets = {}
dataset_id = 0

X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y1,
                                                    test_size=0.25,
                                                    random_state=42,
                                                    shuffle=True)
comment = 'original dataset feature 1; scaled;'
datasets[dataset_id] = {'X_train': X_train,
                        'X_test' : X_test, 'y_train': y_train, 'y_test' : y_test, 'scaler' : scaler,  'comment' : comment, 'dataset' : dataset_id}
dataset_id +=1
X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y2,
                                                    test_size=0.25,
                                                    random_state=42,
                                                    shuffle=True)
comment = 'original dataset feature 2; scaled;'
datasets[dataset_id] = {'X_train': X_train, 'X_test' : X_test, 'y_train': y_train, 'y_test' : y_test, 'scaler' : scaler,  'comment' : comment, 'dataset' : dataset_id}

The results are:

Regression type model Predictions R2 MSE MAE MSE_true_scale RMSE_true_scale MAE_true_scale MedAE_true_scale Training time dataset
0 Linear Regression GridSearchCV(cv=5, error_score='raise-deprecat... [0.9927071019876545, 1.0032517959723932, 0.994... 0.837370 3.485335e-05 0.004581 3.485335e-05 0.005904 0.004581 3.740254e-03 1.511132 0
1 Bayesian Ridge Regression GridSearchCV(cv=5, error_score='raise-deprecat... [0.9926826171743577, 1.003261027758303, 0.9947... 0.837400 3.484687e-05 0.004582 3.484687e-05 0.005903 0.004582 3.739712e-03 0.904656 0
2 Decision Tree Regression GridSearchCV(cv=5, error_score='raise-deprecat... [0.9940000000000001, 0.995, 0.9879999999999999... 0.988392 2.487712e-06 0.001018 2.487712e-06 0.001577 0.001018 1.000000e-03 0.713793 0
3 KNN Regression GridSearchCV(cv=5, error_score='raise-deprecat... [0.9923181201922205, 0.9939762546665988, 0.989... 0.990497 2.036609e-06 0.000847 2.036609e-06 0.001427 0.000847 5.148486e-04 9.354552 0
4 Linear SVM Regression GridSearchCV(cv=5, error_score='raise-deprecat... [0.9958396071879695, 1.0132462116531427, 1.001... 0.568974 9.237327e-05 0.008154 9.237327e-05 0.009611 0.008154 7.817270e-03 46.088759 0
5 Random Forest Regression GridSearchCV(cv=5, error_score='raise-deprecat... [0.9935000000000002, 0.9942721099887767, 0.988... 0.995724 9.163111e-07 0.000564 9.163111e-07 0.000957 0.000564 3.771605e-04 8.621884 0
6 Mondrian Forest Regression GridSearchCV(cv=5, error_score='raise-deprecat... [0.9928282350301743, 0.993985915184021, 0.9890... 0.990219 2.096138e-06 0.000666 2.096138e-06 0.001448 0.000666 2.531537e-04 17.799224 0
7 Linear Regression GridSearchCV(cv=5, error_score='raise-deprecat... [0.9874315009395771, 0.9725626525921139, 0.981... 0.910156 5.090217e-06 0.001678 5.090217e-06 0.002256 0.001678 1.242007e-03 0.458355 1
8 Bayesian Ridge Regression GridSearchCV(cv=5, error_score='raise-deprecat... [0.9874242495485817, 0.9725682429960683, 0.981... 0.910154 5.090319e-06 0.001678 5.090319e-06 0.002256 0.001678 1.241740e-03 1.092284 1
9 Decision Tree Regression GridSearchCV(cv=5, error_score='raise-deprecat... [0.987, 0.9749999999999999, 0.9829999999999999... 0.982127 1.012606e-06 0.000520 1.012606e-06 0.001006 0.000520 1.110223e-16 0.940086 1
10 KNN Regression GridSearchCV(cv=5, error_score='raise-deprecat... [0.9869882389423871, 0.9770486989251895, 0.981... 0.974889 1.422696e-06 0.000571 1.422696e-06 0.001193 0.000571 2.810538e-04 9.072129 1
11 Linear SVM Regression GridSearchCV(cv=5, error_score='raise-deprecat... [0.9893787274240833, 0.975145240807732, 0.9834... 0.779601 1.248696e-05 0.002970 1.248696e-05 0.003534 0.002970 2.876610e-03 31.787359 1
12 Random Forest Regression GridSearchCV(cv=5, error_score='raise-deprecat... [0.9869444444444443, 0.9766666666666668, 0.982... 0.990637 5.304915e-07 0.000380 5.304915e-07 0.000728 0.000380 2.207224e-04 8.732420 1
13 Mondrian Forest Regression GridSearchCV(cv=5, error_score='raise-deprecat... [0.9870455884933471, 0.9770771062374115, 0.981... 0.975283 1.400383e-06 0.000509 1.400383e-06 0.001183 0.000509 1.587563e-04 19.048963 1

Predicting turbine decays seems to be a lot more challening. Despite rather good metrics, we can see different “categories” quite clearly. Hence, we can conclude that all models generalize rather poorly.

References

[1] A. Coraddu, L. Oneto, A. Ghio, S. Savio, D. Anguita, M. Figari (2014): Machine Learning Approaches for Improving Condition?Based Maintenance of Naval Propulsion Plants, Journal of Engineering for the Maritime Environment, DOI:10.1177/1475090214540874

[2] M. Altosole, G. Benvenuto, M. Figari, U. Campora (2009): Real-time simulation of a cogag naval ship propulsion system, Proceedings of the Institution of Mechanical Engineers, Part M: Journal of Engineering for the Maritime Environment 223 (1), 47-62. DOI:10.1243/14750902JEME121