Condition Based Maintenance of Naval Propulsion Plants Dataset by Altosole et al. (2009) and Coraddu et al. (2014) [1, 2].
We’ll find it on the UCI Machine Learning Repository. The aim of applying machine learning to this dataset is to predict decay states of turbines and generators.
Let’s have a look at the dataset:
import time
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_log_error, mean_squared_error, mean_absolute_error
import sklearn.metrics
import pandas as pd
import numpy as np
import seaborn as sns
import pickle
import matplotlib.pyplot as plt
import matplotlib.transforms as mtransforms
import warnings
warnings.simplefilter('ignore')
from sklearn.preprocessing import MaxAbsScaler
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.linear_model import LinearRegression, BayesianRidge
from sklearn.svm import LinearSVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from skgarden import MondrianForestRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import r2_score, mean_squared_error, make_scorer, mean_absolute_error, median_absolute_error
import xgboost
from sklearn.neighbors import KNeighborsRegressor
featureList = ["Lever position (lp)",
"Ship speed (v) [knots]",
"Gas Turbine shaft torque (GTT) [kN m]",
"Gas Turbine rate of revolutions (GTn) [rpm]",
"Gas Generator rate of revolutions (GGn) [rpm]",
"Starboard Propeller Torque (Ts) [kN]",
"Port Propeller Torque (Tp) [kN]",
"HP Turbine exit temperature (T48) [C]",
"GT Compressor inlet air temperature (T1) [C]",
"GT Compressor outlet air temperature (T2) [C]",
"HP Turbine exit pressure (P48) [bar]",
"GT Compressor inlet air pressure (P1) [bar]",
"GT Compressor outlet air pressure (P2) [bar]",
"Gas Turbine exhaust gas pressure (Pexh) [bar]",
"Turbine Injecton Control (TIC) [%]",
"Fuel flow (mf) [kg/s]",
"GT Compressor decay state coefficient",
"GT Turbine decay state coefficient"]
input_data = np.loadtxt("./data/data.txt")
input_data = pd.DataFrame(input_data)
input_data.columns = featureList
display(input_data.sample(10))
display(input_data.describe())
Lever position (lp) | Ship speed (v) [knots] | Gas Turbine shaft torque (GTT) [kN m] | Gas Turbine rate of revolutions (GTn) [rpm] | Gas Generator rate of revolutions (GGn) [rpm] | Starboard Propeller Torque (Ts) [kN] | Port Propeller Torque (Tp) [kN] | HP Turbine exit temperature (T48) [C] | GT Compressor inlet air temperature (T1) [C] | GT Compressor outlet air temperature (T2) [C] | HP Turbine exit pressure (P48) [bar] | GT Compressor inlet air pressure (P1) [bar] | GT Compressor outlet air pressure (P2) [bar] | Gas Turbine exhaust gas pressure (Pexh) [bar] | Turbine Injecton Control (TIC) [%] | Fuel flow (mf) [kg/s] | GT Compressor decay state coefficient | GT Turbine decay state coefficient | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4164 | 7.148 | 21.0 | 38984.125 | 2678.083 | 9134.088 | 332.196 | 332.196 | 822.672 | 288.0 | 691.172 | 2.963 | 0.998 | 15.395 | 1.035 | 43.655 | 0.864 | 0.967 | 0.995 |
11275 | 8.206 | 24.0 | 50988.694 | 3087.250 | 9292.350 | 437.905 | 437.905 | 916.089 | 288.0 | 726.146 | 3.608 | 0.998 | 18.714 | 1.042 | 60.182 | 1.191 | 0.998 | 0.979 |
11787 | 7.148 | 21.0 | 39013.870 | 2677.969 | 9112.900 | 332.565 | 332.565 | 812.310 | 288.0 | 683.069 | 2.992 | 0.998 | 15.690 | 1.036 | 43.382 | 0.859 | 1.000 | 0.984 |
574 | 8.206 | 24.0 | 50997.437 | 3087.622 | 9321.328 | 438.131 | 438.131 | 939.372 | 288.0 | 739.327 | 3.568 | 0.998 | 18.404 | 1.041 | 61.297 | 1.214 | 0.952 | 0.986 |
8488 | 2.088 | 6.0 | 4944.888 | 1394.855 | 6778.351 | 31.602 | 31.602 | 552.041 | 288.0 | 565.941 | 1.278 | 0.998 | 6.913 | 1.020 | 0.000 | 0.188 | 0.986 | 0.982 |
2581 | 8.206 | 24.0 | 50994.648 | 3087.478 | 9311.778 | 437.974 | 437.974 | 944.707 | 288.0 | 738.216 | 3.584 | 0.998 | 18.669 | 1.041 | 62.011 | 1.228 | 0.961 | 0.975 |
11159 | 9.300 | 27.0 | 72768.661 | 3560.398 | 9739.194 | 644.792 | 644.792 | 1048.321 | 288.0 | 769.827 | 4.536 | 0.998 | 22.657 | 1.052 | 87.115 | 1.724 | 0.997 | 0.992 |
8846 | 9.300 | 27.0 | 72774.460 | 3560.396 | 9752.740 | 644.870 | 644.870 | 1056.432 | 288.0 | 772.829 | 4.520 | 0.998 | 22.512 | 1.051 | 87.564 | 1.733 | 0.987 | 0.995 |
2268 | 1.138 | 3.0 | 4769.838 | 1313.513 | 6685.315 | 5.591 | 5.591 | 604.319 | 288.0 | 567.452 | 1.245 | 0.998 | 6.717 | 1.019 | 32.167 | 0.268 | 0.959 | 0.993 |
7409 | 3.144 | 9.0 | 8377.115 | 1386.746 | 7084.867 | 60.329 | 60.329 | 578.171 | 288.0 | 578.343 | 1.390 | 0.998 | 7.464 | 1.020 | 11.991 | 0.237 | 0.981 | 0.992 |
Lever position (lp) | Ship speed (v) [knots] | Gas Turbine shaft torque (GTT) [kN m] | Gas Turbine rate of revolutions (GTn) [rpm] | Gas Generator rate of revolutions (GGn) [rpm] | Starboard Propeller Torque (Ts) [kN] | Port Propeller Torque (Tp) [kN] | HP Turbine exit temperature (T48) [C] | GT Compressor inlet air temperature (T1) [C] | GT Compressor outlet air temperature (T2) [C] | HP Turbine exit pressure (P48) [bar] | GT Compressor inlet air pressure (P1) [bar] | GT Compressor outlet air pressure (P2) [bar] | Gas Turbine exhaust gas pressure (Pexh) [bar] | Turbine Injecton Control (TIC) [%] | Fuel flow (mf) [kg/s] | GT Compressor decay state coefficient | GT Turbine decay state coefficient | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 11934.000000 | 11934.000000 | 11934.000000 | 11934.000000 | 11934.000000 | 11934.000000 | 11934.000000 | 11934.000000 | 11934.0 | 11934.000000 | 11934.000000 | 1.193400e+04 | 11934.000000 | 11934.000000 | 11934.000000 | 11934.000000 | 11934.00000 | 11934.0000 |
mean | 5.166667 | 15.000000 | 27247.498685 | 2136.289256 | 8200.947312 | 227.335768 | 227.335768 | 735.495446 | 288.0 | 646.215331 | 2.352963 | 9.980000e-01 | 12.297123 | 1.029474 | 33.641261 | 0.662440 | 0.97500 | 0.9875 |
std | 2.626388 | 7.746291 | 22148.613155 | 774.083881 | 1091.315507 | 200.495889 | 200.495889 | 173.680552 | 0.0 | 72.675882 | 1.084770 | 2.533635e-13 | 5.337448 | 0.010390 | 25.841363 | 0.507132 | 0.01472 | 0.0075 |
min | 1.138000 | 3.000000 | 253.547000 | 1307.675000 | 6589.002000 | 5.304000 | 5.304000 | 442.364000 | 288.0 | 540.442000 | 1.093000 | 9.980000e-01 | 5.828000 | 1.019000 | 0.000000 | 0.068000 | 0.95000 | 0.9750 |
25% | 3.144000 | 9.000000 | 8375.883750 | 1386.758000 | 7058.324000 | 60.317000 | 60.317000 | 589.872750 | 288.0 | 578.092250 | 1.389000 | 9.980000e-01 | 7.447250 | 1.020000 | 13.677500 | 0.246000 | 0.96200 | 0.9810 |
50% | 5.140000 | 15.000000 | 21630.659000 | 1924.326000 | 8482.081500 | 175.268000 | 175.268000 | 706.038000 | 288.0 | 637.141500 | 2.083000 | 9.980000e-01 | 11.092000 | 1.026000 | 25.276500 | 0.496000 | 0.97500 | 0.9875 |
75% | 7.148000 | 21.000000 | 39001.426750 | 2678.079000 | 9132.606000 | 332.364750 | 332.364750 | 834.066250 | 288.0 | 693.924500 | 2.981000 | 9.980000e-01 | 15.658000 | 1.036000 | 44.552500 | 0.882000 | 0.98800 | 0.9940 |
max | 9.300000 | 27.000000 | 72784.872000 | 3560.741000 | 9797.103000 | 645.249000 | 645.249000 | 1115.797000 | 288.0 | 789.094000 | 4.560000 | 9.980000e-01 | 23.140000 | 1.052000 | 92.556000 | 1.832000 | 1.00000 | 1.0000 |
Let’s have a look at the target variables:
It looks like most features are distributed normally:
Let’s split the dataset, run some algorithms on it and see what happens.
# split data into X and y
y1 = input_data['GT Compressor decay state coefficient'].copy(deep=True)
y2 = input_data['GT Turbine decay state coefficient'].copy(deep=True)
X = input_data.copy(deep=True)
X.drop(['GT Compressor decay state coefficient','GT Turbine decay state coefficient'], inplace=True, axis=1)
scaler = MaxAbsScaler()
X.loc[:,:] = scaler.fit_transform(X)
datasets = {}
dataset_id = 0
X_train, X_test, y_train, y_test = train_test_split(X,
y1,
test_size=0.25,
random_state=42,
shuffle=True)
comment = 'original dataset feature 1; scaled;'
datasets[dataset_id] = {'X_train': X_train,
'X_test' : X_test, 'y_train': y_train, 'y_test' : y_test, 'scaler' : scaler, 'comment' : comment, 'dataset' : dataset_id}
dataset_id +=1
X_train, X_test, y_train, y_test = train_test_split(X,
y2,
test_size=0.25,
random_state=42,
shuffle=True)
comment = 'original dataset feature 2; scaled;'
datasets[dataset_id] = {'X_train': X_train, 'X_test' : X_test, 'y_train': y_train, 'y_test' : y_test, 'scaler' : scaler, 'comment' : comment, 'dataset' : dataset_id}
The results are:
Regression type | model | Predictions | R2 | MSE | MAE | MSE_true_scale | RMSE_true_scale | MAE_true_scale | MedAE_true_scale | Training time | dataset | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Linear Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [0.9927071019876545, 1.0032517959723932, 0.994... | 0.837370 | 3.485335e-05 | 0.004581 | 3.485335e-05 | 0.005904 | 0.004581 | 3.740254e-03 | 1.511132 | 0 |
1 | Bayesian Ridge Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [0.9926826171743577, 1.003261027758303, 0.9947... | 0.837400 | 3.484687e-05 | 0.004582 | 3.484687e-05 | 0.005903 | 0.004582 | 3.739712e-03 | 0.904656 | 0 |
2 | Decision Tree Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [0.9940000000000001, 0.995, 0.9879999999999999... | 0.988392 | 2.487712e-06 | 0.001018 | 2.487712e-06 | 0.001577 | 0.001018 | 1.000000e-03 | 0.713793 | 0 |
3 | KNN Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [0.9923181201922205, 0.9939762546665988, 0.989... | 0.990497 | 2.036609e-06 | 0.000847 | 2.036609e-06 | 0.001427 | 0.000847 | 5.148486e-04 | 9.354552 | 0 |
4 | Linear SVM Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [0.9958396071879695, 1.0132462116531427, 1.001... | 0.568974 | 9.237327e-05 | 0.008154 | 9.237327e-05 | 0.009611 | 0.008154 | 7.817270e-03 | 46.088759 | 0 |
5 | Random Forest Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [0.9935000000000002, 0.9942721099887767, 0.988... | 0.995724 | 9.163111e-07 | 0.000564 | 9.163111e-07 | 0.000957 | 0.000564 | 3.771605e-04 | 8.621884 | 0 |
6 | Mondrian Forest Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [0.9928282350301743, 0.993985915184021, 0.9890... | 0.990219 | 2.096138e-06 | 0.000666 | 2.096138e-06 | 0.001448 | 0.000666 | 2.531537e-04 | 17.799224 | 0 |
7 | Linear Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [0.9874315009395771, 0.9725626525921139, 0.981... | 0.910156 | 5.090217e-06 | 0.001678 | 5.090217e-06 | 0.002256 | 0.001678 | 1.242007e-03 | 0.458355 | 1 |
8 | Bayesian Ridge Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [0.9874242495485817, 0.9725682429960683, 0.981... | 0.910154 | 5.090319e-06 | 0.001678 | 5.090319e-06 | 0.002256 | 0.001678 | 1.241740e-03 | 1.092284 | 1 |
9 | Decision Tree Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [0.987, 0.9749999999999999, 0.9829999999999999... | 0.982127 | 1.012606e-06 | 0.000520 | 1.012606e-06 | 0.001006 | 0.000520 | 1.110223e-16 | 0.940086 | 1 |
10 | KNN Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [0.9869882389423871, 0.9770486989251895, 0.981... | 0.974889 | 1.422696e-06 | 0.000571 | 1.422696e-06 | 0.001193 | 0.000571 | 2.810538e-04 | 9.072129 | 1 |
11 | Linear SVM Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [0.9893787274240833, 0.975145240807732, 0.9834... | 0.779601 | 1.248696e-05 | 0.002970 | 1.248696e-05 | 0.003534 | 0.002970 | 2.876610e-03 | 31.787359 | 1 |
12 | Random Forest Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [0.9869444444444443, 0.9766666666666668, 0.982... | 0.990637 | 5.304915e-07 | 0.000380 | 5.304915e-07 | 0.000728 | 0.000380 | 2.207224e-04 | 8.732420 | 1 |
13 | Mondrian Forest Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [0.9870455884933471, 0.9770771062374115, 0.981... | 0.975283 | 1.400383e-06 | 0.000509 | 1.400383e-06 | 0.001183 | 0.000509 | 1.587563e-04 | 19.048963 | 1 |
Predicting turbine decays seems to be a lot more challening. Despite rather good metrics, we can see different “categories” quite clearly. Hence, we can conclude that all models generalize rather poorly.
References
[1] A. Coraddu, L. Oneto, A. Ghio, S. Savio, D. Anguita, M. Figari (2014): Machine Learning Approaches for Improving Condition?Based Maintenance of Naval Propulsion Plants, Journal of Engineering for the Maritime Environment, DOI:10.1177/1475090214540874
[2] M. Altosole, G. Benvenuto, M. Figari, U. Campora (2009): Real-time simulation of a cogag naval ship propulsion system, Proceedings of the Institution of Mechanical Engineers, Part M: Journal of Engineering for the Maritime Environment 223 (1), 47-62. DOI:10.1243/14750902JEME121