Final Model Evaluation

Avocado dataset

predictions = pd.concat([rfe_score, XGBR_score, rr_score, rf_score, lm_score], ignore_index=True, sort=False)
predictions
Comparative Performance of Regression Models
Model R2 Score Adjusted R2 Score Cross Validated R2 Score RMSE
0 Random Forest with RFE 0.800169 0.797581 0.889159 0.180622
1 XGBoost 0.798641 0.796034 0.911125 0.181311
2 Ridge Regression 0.598733 0.593537 0.604317 0.255950
3 Random Forest 0.787120 0.784363 0.876525 0.186426
4 Linear Regression 0.598793 0.593598 0.604281 0.255931

 

Boston dataset

predictions2 = pd.concat([rfe_score2, XGBR_score2, rr_score2, rf_score2, lm_score2], ignore_index=True, sort=False)
predictions2
Regression Model Performance Metrics
Model R2 Score Adjusted R2 Score Cross Validated R2 Score RMSE
0 Random Forest with RFE 0.839377 0.824246 0.821140 3.459550
1 XGBoost 0.901889 0.892646 0.845593 2.703810
2 Ridge Regression 0.678696 0.648428 0.689293 4.892991
3 Random Forest 0.838576 0.823369 0.817514 3.468169
4 Linear Regression 0.679168 0.648945 0.687535 4.889394

 

Visualizing Model Performance

f, axe = plt.subplots(1,1, figsize=(18,6))

predictions.sort_values(by=['Cross Validated R2 Score'], ascending=False, inplace=True)

sns.barplot(x='Cross Validated R2 Score', y='Model', data = predictions, ax = axe)
axe.set_xlabel('Cross Validated R2 Score', size=16)
axe.set_ylabel('Model')
axe.set_xlim(0,1.0)

axe.set(title='Model Performance for Avocado dataset')

plt.show()
Horizontal bar chart comparing the performance of 5 models on an Avocado dataset; XGBoost highest, Linear Regression lowest.

f, axe = plt.subplots(1,1, figsize=(18,6))

predictions2.sort_values(by=['Cross Validated R2 Score'], ascending=False, inplace=True)

sns.barplot(x='Cross Validated R2 Score', y='Model', data = predictions2, ax = axe)
axe.set_xlabel('Cross Validated R2 Score', size=16)
axe.set_ylabel('Model')
axe.set_xlim(0,1.0)

axe.set(title='Model Performance for Boston dataset')
plt.show()
Horizontal bar chart showing cross-validated R2 scores of 5 models. XGBoost performs best, Linear Regression performs worse