Evaluating Regression Models: Metrics and Loss Functions

Final performance comparison

Avocado data set

result = pd.concat([clf_score, predictions], ignore_index=True, sort=False)
result

Regression Model Performance Comparison

	Model	R2 Score	Adjusted R2 Score	Cross Validated R2 Score	RMSE
0	Tuned Ridge Regression	0.736622	0.733212	0.739008	0.210438
1	XGBoost	0.798641	0.796034	0.911125	0.181311
2	Random Forest with RFE	0.800169	0.797581	0.889159	0.180622
3	Random Forest	0.787120	0.784363	0.876525	0.186426
4	Ridge Regression	0.598733	0.593537	0.604317	0.255950
5	Linear Regression	0.598793	0.593598	0.604281	0.255931

f, axe = plt.subplots(1,1, figsize=(18,6))

result.sort_values(by=['Cross Validated R2 Score'], ascending=False, inplace=True)

sns.barplot(x='Cross Validated R2 Score', y='Model', data = result, ax = axe)
#axes[0].set(xlabel='Region', ylabel='Charges')
axe.set_xlabel('Cross Validated R2 Score', size=16)
axe.set_ylabel('Model')
axe.set_xlim(0,1.0)
axe.set(title='Model Performance for Avocado dataset')

plt.show()

Model performance of avocado data. Models are ranked by cross-validated R2 scores: XGBoost and Random Forest with RFE are top

Boston data set

result = pd.concat([clf_score2, predictions2], ignore_index=True, sort=False)
result

Regression Model Performance Metrics (Final Comparison)

	Model	R2 Score	Adjusted R2 Score	Cross Validated R2 Score	RMSE
0	Tuned Ridge Regression	0.793267	0.773792	0.844628	3.965999
1	XGBoost	0.901889	0.892646	0.845593	2.703810
2	Random Forest with RFE	0.839377	0.824246	0.821140	3.459550
3	Random Forest	0.838576	0.823369	0.817514	3.468169
4	Ridge Regression	0.678696	0.648428	0.689293	4.892991
5	Linear Regression	0.679168	0.648945	0.687535	4.889394

f, axe = plt.subplots(1,1, figsize=(18,6))

result.sort_values(by=['Cross Validated R2 Score'], ascending=False, inplace=True)

sns.barplot(x='Cross Validated R2 Score', y='Model', data = result, ax = axe)
#axes[0].set(xlabel='Region', ylabel='Charges')
axe.set_xlabel('Cross Validated R2 Score', size=16)
axe.set_ylabel('Model')
axe.set_xlim(0,1.0)
axe.set(title='Model Performance for Boston dataset')

plt.show()

Boston dataset model performance. Horizontal bar chart shows XGBoost with highest score, and Linear Regression with lowest