Evaluating Regression Models: Metrics and Loss Functions: Comparing different models

Comparing different models

Linear Regression

from sklearn.linear_model import LinearRegression

# Creating and training model
lm = LinearRegression()
lm.fit(X_train, y_train)

# Model making a prediction on test data
y_pred = lm.predict(X_test)

Linear Regression performance for Avocado dataset

ndf = [Reg_Models_Evaluation_Metrics(lm,X_train,y_train,X_test,y_test,y_pred)]

lm_score = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE'])
lm_score.insert(0, 'Model', 'Linear Regression')
lm_score

Linear Regression Model Performance Metrics

	Model	R2 Score	Adjusted R2 Score	Cross Validated R2 Score	RMSE
0	Linear Regression	0.598793	0.593598	0.604281	0.255931

plt.figure(figsize = (10,5))
sns.regplot(x=y_test,y=y_pred)
plt.title('Linear regression for Avocado dataset', fontsize = 20)

Text(0.5, 1.0, 'Linear regression for Avocado dataset')

Linear Regression performance for Boston dataset

lm.fit(X_train2, y_train2)
y_pred = lm.predict(X_test2)

ndf = [Reg_Models_Evaluation_Metrics(lm,X_train2,y_train2,X_test2,y_test2,y_pred)]

lm_score2 = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE'])
lm_score2.insert(0, 'Model', 'Linear Regression')
lm_score2

Comparison of different regression models based on R2, adjusted R2, cross-validated R2, and RMSE scores.
	Model	R2 Score	Adjusted R2 Score	Cross Validated R2 Score	RMSE
0	Linear Regression	0.679168	0.648945	0.687535	4.889394

Random Forest

from sklearn.ensemble import RandomForestRegressor

# Creating and training model
RandomForest_reg = RandomForestRegressor(n_estimators = 10, random_state = 0)

Random Forest performance for Avocado dataset

RandomForest_reg.fit(X_train, y_train)
# Model making a prediction on test data
y_pred = RandomForest_reg.predict(X_test)

ndf = [Reg_Models_Evaluation_Metrics(RandomForest_reg,X_train,y_train,X_test,y_test,y_pred)]

rf_score = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE'])
rf_score.insert(0, 'Model', 'Random Forest')
rf_score

Performance metrics for Random Forest model including R2, adjusted R2, cross-validated R2, and RMSE.
	Model	R2 Score	Adjusted R2 Score	Cross Validated R2 Score	RMSE
0	Random Forest	0.78712	0.784363	0.876525	0.186426

Random Forest performance for Boston dataset

RandomForest_reg.fit(X_train2, y_train2)
# Model making a prediction on test data
y_pred = RandomForest_reg.predict(X_test2)

ndf = [Reg_Models_Evaluation_Metrics(RandomForest_reg,X_train2,y_train2,X_test2,y_test2,y_pred)]

rf_score2 = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE'])
rf_score2.insert(0, 'Model', 'Random Forest')
rf_score2

Performance metrics for Random Forest model including R2, adjusted R2, cross-validated R2, and RMSE.
	Model	R2 Score	Adjusted R2 Score	Cross Validated R2 Score	RMSE
0	Random Forest	0.838576	0.823369	0.817514	3.468169

Ridge Regression

from sklearn.linear_model import Ridge

# Creating and training model
ridge_reg = Ridge(alpha=3, solver="cholesky")

Ridge Regression performance for Avocado dataset

ridge_reg.fit(X_train, y_train)
# Model making a prediction on test data
y_pred = ridge_reg.predict(X_test)

ndf = [Reg_Models_Evaluation_Metrics(ridge_reg,X_train,y_train,X_test,y_test,y_pred)]

rr_score = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE'])
rr_score.insert(0, 'Model', 'Ridge Regression')
rr_score

Performance metrics for Ridge Regression model including R2, adjusted R2, cross-validated R2, and RMSE.
	Model	R2 Score	Adjusted R2 Score	Cross Validated R2 Score	RMSE
0	Ridge Regression	0.598733	0.593537	0.604317	0.25595

Ridge Regression performance for Boston dataset

ridge_reg.fit(X_train2, y_train2)
# Model making a prediction on test data
y_pred = ridge_reg.predict(X_test2)

ndf = [Reg_Models_Evaluation_Metrics(ridge_reg,X_train2,y_train2,X_test2,y_test2,y_pred)]

rr_score2 = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE'])
rr_score2.insert(0, 'Model', 'Ridge Regression')
rr_score2

Performance metrics for Ridge Regression model including R2, adjusted R2, cross-validated R2, and RMSE.
	Model	R2 Score	Adjusted R2 Score	Cross Validated R2 Score	RMSE
0	Ridge Regression	0.678696	0.648428	0.689293	4.892991

XGBoost

from xgboost import XGBRegressor
# create an xgboost regression model
XGBR = XGBRegressor(n_estimators=1000, max_depth=7, eta=0.1, subsample=0.8, colsample_bytree=0.8)

XGBoost performance for Avocado dataset

XGBR.fit(X_train, y_train)
# Model making a prediction on test data
y_pred = XGBR.predict(X_test)

ndf = [Reg_Models_Evaluation_Metrics(XGBR,X_train,y_train,X_test,y_test,y_pred)]

XGBR_score = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE'])
XGBR_score.insert(0, 'Model', 'XGBoost')
XGBR_score

Performance metrics for XGBoost model including R2, adjusted R2, cross-validated R2, and RMSE.
	Model	R2 Score	Adjusted R2 Score	Cross Validated R2 Score	RMSE
0	XGBoost	0.798641	0.796034	0.911125	0.181311

XGBoost performance for Boston dataset

XGBR.fit(X_train2, y_train2)
# Model making a prediction on test data
y_pred = XGBR.predict(X_test2)

ndf = [Reg_Models_Evaluation_Metrics(XGBR,X_train2,y_train2,X_test2,y_test2,y_pred)]

XGBR_score2 = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE'])
XGBR_score2.insert(0, 'Model', 'XGBoost')
XGBR_score2

Performance metrics for XGBoost model including R2, adjusted R2, cross-validated R2, and RMSE.
	Model	R2 Score	Adjusted R2 Score	Cross Validated R2 Score	RMSE
0	XGBoost	0.901889	0.892646	0.845593	2.70381

Recursive Feature Elimination (RFE)

RFE is a wrapper-type feature selection algorithm. This means that a different machine learning algorithm is given and used in the core of the method, is wrapped by RFE, and used to help select features.

Random Forest has usually good performance combining with RFE

from sklearn.feature_selection import RFE
from sklearn.pipeline import Pipeline

# create pipeline
rfe = RFE(estimator=RandomForestRegressor(), n_features_to_select=60)
model = RandomForestRegressor()
rf_pipeline = Pipeline(steps=[('s',rfe),('m',model)])

Random Forest RFE performance for Avocado dataset

rf_pipeline.fit(X_train, y_train)
# Model making a prediction on test data
y_pred = rf_pipeline.predict(X_test)

ndf = [Reg_Models_Evaluation_Metrics(rf_pipeline,X_train,y_train,X_test,y_test,y_pred)]

rfe_score = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE'])
rfe_score.insert(0, 'Model', 'Random Forest with RFE')
rfe_score

Performance metrics for Random Forest with RFE model including R2, adjusted R2, cross-validated R2, and RMSE.
	Model	R2 Score	Adjusted R2 Score	Cross Validated R2 Score	RMSE
0	Random Forest with RFE	0.800169	0.797581	0.889159	0.180622

Random Forest RFE performance for Boston dataset

# create pipeline
rfe = RFE(estimator=RandomForestRegressor(), n_features_to_select=8)
model = RandomForestRegressor()
rf_pipeline = Pipeline(steps=[('s',rfe),('m',model)])

rf_pipeline.fit(X_train2, y_train2)
# Model making a prediction on test data
y_pred = rf_pipeline.predict(X_test2)

ndf = [Reg_Models_Evaluation_Metrics(rf_pipeline,X_train2,y_train2,X_test2,y_test2,y_pred)]

rfe_score2 = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE'])
rfe_score2.insert(0, 'Model', 'Random Forest with RFE')
rfe_score2

Performance metrics for Random Forest with RFE model including R2, adjusted R2, cross-validated R2, and RMSE.
	Model	R2 Score	Adjusted R2 Score	Cross Validated R2 Score	RMSE
0	Random Forest with RFE	0.839377	0.824246	0.82114	3.45955