Evaluating Regression Models: Metrics and Loss Functions: Bonus: hyperparameter Tuning Using GridSearchCV

Bonus: hyperparameter Tuning Using GridSearchCV

Hyperparameter tuning is the process of tuning the parameters present as the tuples while we build machine learning models. These parameters are defined by us. Machine learning algorithms never learn these parameters. These can be tuned in different step.

GridSearchCV is a technique for finding the optimal hyperparameter values from a given set of parameters in a grid. It's essentially a cross-validation technique. The model as well as the parameters must be entered. After extracting the best parameter values, predictions are made.

The "best" parameters that GridSearchCV identifies are technically the best that could be produced, but only by the parameters that you included in your parameter grid.

Tuned Ridge Regression

from sklearn.preprocessing import PolynomialFeatures

# Polynomial features are those features created by raising existing features to an exponent. 
# For example, if a dataset had one input feature X, 
# then a polynomial feature would be the addition of a new feature (column) where values were calculated by squaring the values in X, e.g. X^2.

steps = [
    ('poly', PolynomialFeatures(degree=2)),
    ('model', Ridge(alpha=3.8, fit_intercept=True))
]

ridge_pipe = Pipeline(steps)
ridge_pipe.fit(X_train, y_train)

# Model making a prediction on test data
y_pred = ridge_pipe.predict(X_test)

from sklearn.model_selection import GridSearchCV

alpha_params = {'model__alpha': list(range(1, 15))}

clf = GridSearchCV(ridge_pipe, alpha_params, cv = 10)

Tuned Ridge Regression performance for Avocado dataset

# Fit and tune model
clf.fit(X_train, y_train)
# Model making a prediction on test data
y_pred = ridge_pipe.predict(X_test)
# The combination of hyperparameters along with values that give the best performance of our estimate specified
print(clf.best_params_)

{'model__alpha': 1}

ndf = [Reg_Models_Evaluation_Metrics(clf,X_train,y_train,X_test,y_test,y_pred)]

clf_score = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE'])
clf_score.insert(0, 'Model', 'Tuned Ridge Regression')
clf_score

Tuned Ridge Regression Model Performance Metrics

	Model	R2 Score	Adjusted R2 Score	Cross Validated R2 Score	RMSE
0	Tuned Ridge Regression	0.736622	0.733212	0.739008	0.210438

Tuned Ridge Regression performance for Boston dataset

steps = [
    ('poly', PolynomialFeatures(degree=2)),
    ('model', Ridge(alpha=3.8, fit_intercept=True))
]

ridge_pipe = Pipeline(steps)
ridge_pipe.fit(X_train2, y_train2)

# Model making a prediction on test data
y_pred = ridge_pipe.predict(X_test2)

alpha_params = {'model__alpha': list(range(1, 15))}

clf = GridSearchCV(ridge_pipe, alpha_params, cv = 10)
# Fit and tune model
clf.fit(X_train2, y_train2)
# Model making a prediction on test data
y_pred = ridge_pipe.predict(X_test2)
# The combination of hyperparameters along with values that give the best performance of our estimate specified
print(clf.best_params_)

{'model__alpha': 12}

ndf = [Reg_Models_Evaluation_Metrics(clf,X_train2,y_train2,X_test2,y_test2,y_pred)]

clf_score2 = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE'])
clf_score2.insert(0, 'Model', 'Tuned Ridge Regression')
clf_score2

Tuned Ridge Regression: Final Model Performance Metrics

	Model	R2 Score	Adjusted R2 Score	Cross Validated R2 Score	RMSE
0	Tuned Ridge Regression	0.793267	0.773792	0.844628	3.965999