Final performance comparison

Avocado data set

result = pd.concat([clf_score, predictions], ignore_index=True, sort=False)
result
Regression Model Performance Comparison
Model R2 Score Adjusted R2 Score Cross Validated R2 Score RMSE
0 Tuned Ridge Regression 0.736622 0.733212 0.739008 0.210438
1 XGBoost 0.798641 0.796034 0.911125 0.181311
2 Random Forest with RFE 0.800169 0.797581 0.889159 0.180622
3 Random Forest 0.787120 0.784363 0.876525 0.186426
4 Ridge Regression 0.598733 0.593537 0.604317 0.255950
5 Linear Regression 0.598793 0.593598 0.604281 0.255931

 

f, axe = plt.subplots(1,1, figsize=(18,6))

result.sort_values(by=['Cross Validated R2 Score'], ascending=False, inplace=True)

sns.barplot(x='Cross Validated R2 Score', y='Model', data = result, ax = axe)
#axes[0].set(xlabel='Region', ylabel='Charges')
axe.set_xlabel('Cross Validated R2 Score', size=16)
axe.set_ylabel('Model')
axe.set_xlim(0,1.0)
axe.set(title='Model Performance for Avocado dataset')

plt.show()

 

Model performance of avocado data. Models are ranked by cross-validated R2 scores: XGBoost and Random Forest with RFE are top

 

Boston data set

result = pd.concat([clf_score2, predictions2], ignore_index=True, sort=False)
result
Regression Model Performance Metrics (Final Comparison)
Model R2 Score Adjusted R2 Score Cross Validated R2 Score RMSE
0 Tuned Ridge Regression 0.793267 0.773792 0.844628 3.965999
1 XGBoost 0.901889 0.892646 0.845593 2.703810
2 Random Forest with RFE 0.839377 0.824246 0.821140 3.459550
3 Random Forest 0.838576 0.823369 0.817514 3.468169
4 Ridge Regression 0.678696 0.648428 0.689293 4.892991
5 Linear Regression 0.679168 0.648945 0.687535 4.889394

 

f, axe = plt.subplots(1,1, figsize=(18,6))

result.sort_values(by=['Cross Validated R2 Score'], ascending=False, inplace=True)

sns.barplot(x='Cross Validated R2 Score', y='Model', data = result, ax = axe)
#axes[0].set(xlabel='Region', ylabel='Charges')
axe.set_xlabel('Cross Validated R2 Score', size=16)
axe.set_ylabel('Model')
axe.set_xlim(0,1.0)
axe.set(title='Model Performance for Boston dataset')

plt.show()

 

Boston dataset model performance. Horizontal bar chart shows XGBoost with highest score, and Linear Regression with lowest