Comparing different models

Linear Regression

from sklearn.linear_model import LinearRegression

# Creating and training model
lm = LinearRegression()
lm.fit(X_train, y_train)

# Model making a prediction on test data
y_pred = lm.predict(X_test)

 

Linear Regression performance for Avocado dataset

ndf = [Reg_Models_Evaluation_Metrics(lm,X_train,y_train,X_test,y_test,y_pred)]

lm_score = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE'])
lm_score.insert(0, 'Model', 'Linear Regression')
lm_score
Linear Regression Model Performance Metrics
Model R2 Score Adjusted R2 Score Cross Validated R2 Score RMSE
0 Linear Regression 0.598793 0.593598 0.604281 0.255931


plt.figure(figsize = (10,5))
sns.regplot(x=y_test,y=y_pred)
plt.title('Linear regression for Avocado dataset', fontsize = 20)
Text(0.5, 1.0, 'Linear regression for Avocado dataset')

Linear Regression performance for Boston dataset

lm.fit(X_train2, y_train2)
y_pred = lm.predict(X_test2)
ndf = [Reg_Models_Evaluation_Metrics(lm,X_train2,y_train2,X_test2,y_test2,y_pred)]

lm_score2 = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE'])
lm_score2.insert(0, 'Model', 'Linear Regression')
lm_score2
Comparison of different regression models based on R2, adjusted R2, cross-validated R2, and RMSE scores.
Model R2 Score Adjusted R2 Score Cross Validated R2 Score RMSE
0 Linear Regression 0.679168 0.648945 0.687535 4.889394


Random Forest

from sklearn.ensemble import RandomForestRegressor

# Creating and training model
RandomForest_reg = RandomForestRegressor(n_estimators = 10, random_state = 0)

Random Forest performance for Avocado dataset

RandomForest_reg.fit(X_train, y_train)
# Model making a prediction on test data
y_pred = RandomForest_reg.predict(X_test)
ndf = [Reg_Models_Evaluation_Metrics(RandomForest_reg,X_train,y_train,X_test,y_test,y_pred)]

rf_score = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE'])
rf_score.insert(0, 'Model', 'Random Forest')
rf_score
Performance metrics for Random Forest model including R2, adjusted R2, cross-validated R2, and RMSE.
Model R2 Score Adjusted R2 Score Cross Validated R2 Score RMSE
0 Random Forest 0.78712 0.784363 0.876525 0.186426

 

Random Forest performance for Boston dataset

RandomForest_reg.fit(X_train2, y_train2)
# Model making a prediction on test data
y_pred = RandomForest_reg.predict(X_test2)
ndf = [Reg_Models_Evaluation_Metrics(RandomForest_reg,X_train2,y_train2,X_test2,y_test2,y_pred)]

rf_score2 = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE'])
rf_score2.insert(0, 'Model', 'Random Forest')
rf_score2
Performance metrics for Random Forest model including R2, adjusted R2, cross-validated R2, and RMSE.
Model R2 Score Adjusted R2 Score Cross Validated R2 Score RMSE
0 Random Forest 0.838576 0.823369 0.817514 3.468169

 

Ridge Regression

from sklearn.linear_model import Ridge

# Creating and training model
ridge_reg = Ridge(alpha=3, solver="cholesky")

 

Ridge Regression performance for Avocado dataset

ridge_reg.fit(X_train, y_train)
# Model making a prediction on test data
y_pred = ridge_reg.predict(X_test)
ndf = [Reg_Models_Evaluation_Metrics(ridge_reg,X_train,y_train,X_test,y_test,y_pred)]

rr_score = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE'])
rr_score.insert(0, 'Model', 'Ridge Regression')
rr_score
Performance metrics for Ridge Regression model including R2, adjusted R2, cross-validated R2, and RMSE.
Model R2 Score Adjusted R2 Score Cross Validated R2 Score RMSE
0 Ridge Regression 0.598733 0.593537 0.604317 0.25595

 

Ridge Regression performance for Boston dataset

ridge_reg.fit(X_train2, y_train2)
# Model making a prediction on test data
y_pred = ridge_reg.predict(X_test2)
ndf = [Reg_Models_Evaluation_Metrics(ridge_reg,X_train2,y_train2,X_test2,y_test2,y_pred)]

rr_score2 = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE'])
rr_score2.insert(0, 'Model', 'Ridge Regression')
rr_score2
Performance metrics for Ridge Regression model including R2, adjusted R2, cross-validated R2, and RMSE.
Model R2 Score Adjusted R2 Score Cross Validated R2 Score RMSE
0 Ridge Regression 0.678696 0.648428 0.689293 4.892991

 

XGBoost

from xgboost import XGBRegressor
# create an xgboost regression model
XGBR = XGBRegressor(n_estimators=1000, max_depth=7, eta=0.1, subsample=0.8, colsample_bytree=0.8)

 

XGBoost performance for Avocado dataset

XGBR.fit(X_train, y_train)
# Model making a prediction on test data
y_pred = XGBR.predict(X_test)
ndf = [Reg_Models_Evaluation_Metrics(XGBR,X_train,y_train,X_test,y_test,y_pred)]

XGBR_score = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE'])
XGBR_score.insert(0, 'Model', 'XGBoost')
XGBR_score
Performance metrics for XGBoost model including R2, adjusted R2, cross-validated R2, and RMSE.
Model R2 Score Adjusted R2 Score Cross Validated R2 Score RMSE
0 XGBoost 0.798641 0.796034 0.911125 0.181311

 

XGBoost performance for Boston dataset

XGBR.fit(X_train2, y_train2)
# Model making a prediction on test data
y_pred = XGBR.predict(X_test2)
ndf = [Reg_Models_Evaluation_Metrics(XGBR,X_train2,y_train2,X_test2,y_test2,y_pred)]

XGBR_score2 = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE'])
XGBR_score2.insert(0, 'Model', 'XGBoost')
XGBR_score2
Performance metrics for XGBoost model including R2, adjusted R2, cross-validated R2, and RMSE.
Model R2 Score Adjusted R2 Score Cross Validated R2 Score RMSE
0 XGBoost 0.901889 0.892646 0.845593 2.70381

 

Recursive Feature Elimination (RFE)

RFE is a wrapper-type feature selection algorithm. This means that a different machine learning algorithm is given and used in the core of the method, is wrapped by RFE, and used to help select features.

Random Forest has usually good performance combining with RFE

from sklearn.feature_selection import RFE
from sklearn.pipeline import Pipeline

# create pipeline
rfe = RFE(estimator=RandomForestRegressor(), n_features_to_select=60)
model = RandomForestRegressor()
rf_pipeline = Pipeline(steps=[('s',rfe),('m',model)])

 

Random Forest RFE performance for Avocado dataset

rf_pipeline.fit(X_train, y_train)
# Model making a prediction on test data
y_pred = rf_pipeline.predict(X_test)
ndf = [Reg_Models_Evaluation_Metrics(rf_pipeline,X_train,y_train,X_test,y_test,y_pred)]

rfe_score = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE'])
rfe_score.insert(0, 'Model', 'Random Forest with RFE')
rfe_score
Performance metrics for Random Forest with RFE model including R2, adjusted R2, cross-validated R2, and RMSE.
Model R2 Score Adjusted R2 Score Cross Validated R2 Score RMSE
0 Random Forest with RFE 0.800169 0.797581 0.889159 0.180622

 

Random Forest RFE performance for Boston dataset

# create pipeline
rfe = RFE(estimator=RandomForestRegressor(), n_features_to_select=8)
model = RandomForestRegressor()
rf_pipeline = Pipeline(steps=[('s',rfe),('m',model)])

rf_pipeline.fit(X_train2, y_train2)
# Model making a prediction on test data
y_pred = rf_pipeline.predict(X_test2)
ndf = [Reg_Models_Evaluation_Metrics(rf_pipeline,X_train2,y_train2,X_test2,y_test2,y_pred)]

rfe_score2 = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE'])
rfe_score2.insert(0, 'Model', 'Random Forest with RFE')
rfe_score2
Performance metrics for Random Forest with RFE model including R2, adjusted R2, cross-validated R2, and RMSE.
Model R2 Score Adjusted R2 Score Cross Validated R2 Score RMSE
0 Random Forest with RFE 0.839377 0.824246 0.82114 3.45955