Evaluating Regression Models: Metrics and Loss Functions

Final Model Evaluation

Avocado dataset

predictions = pd.concat([rfe_score, XGBR_score, rr_score, rf_score, lm_score], ignore_index=True, sort=False)
predictions

Comparative Performance of Regression Models

	Model	R2 Score	Adjusted R2 Score	Cross Validated R2 Score	RMSE
0	Random Forest with RFE	0.800169	0.797581	0.889159	0.180622
1	XGBoost	0.798641	0.796034	0.911125	0.181311
2	Ridge Regression	0.598733	0.593537	0.604317	0.255950
3	Random Forest	0.787120	0.784363	0.876525	0.186426
4	Linear Regression	0.598793	0.593598	0.604281	0.255931

Boston dataset

predictions2 = pd.concat([rfe_score2, XGBR_score2, rr_score2, rf_score2, lm_score2], ignore_index=True, sort=False)
predictions2

Regression Model Performance Metrics

	Model	R2 Score	Adjusted R2 Score	Cross Validated R2 Score	RMSE
0	Random Forest with RFE	0.839377	0.824246	0.821140	3.459550
1	XGBoost	0.901889	0.892646	0.845593	2.703810
2	Ridge Regression	0.678696	0.648428	0.689293	4.892991
3	Random Forest	0.838576	0.823369	0.817514	3.468169
4	Linear Regression	0.679168	0.648945	0.687535	4.889394

Visualizing Model Performance

f, axe = plt.subplots(1,1, figsize=(18,6))

predictions.sort_values(by=['Cross Validated R2 Score'], ascending=False, inplace=True)

sns.barplot(x='Cross Validated R2 Score', y='Model', data = predictions, ax = axe)
axe.set_xlabel('Cross Validated R2 Score', size=16)
axe.set_ylabel('Model')
axe.set_xlim(0,1.0)

axe.set(title='Model Performance for Avocado dataset')

plt.show()

Horizontal bar chart comparing the performance of 5 models on an Avocado dataset; XGBoost highest, Linear Regression lowest.

f, axe = plt.subplots(1,1, figsize=(18,6))

predictions2.sort_values(by=['Cross Validated R2 Score'], ascending=False, inplace=True)

sns.barplot(x='Cross Validated R2 Score', y='Model', data = predictions2, ax = axe)
axe.set_xlabel('Cross Validated R2 Score', size=16)
axe.set_ylabel('Model')
axe.set_xlim(0,1.0)

axe.set(title='Model Performance for Boston dataset')
plt.show()

Horizontal bar chart showing cross-validated R2 scores of 5 models. XGBoost performs best, Linear Regression performs worse