Bonus: hyperparameter Tuning Using GridSearchCV
Hyperparameter tuning is the process of tuning the parameters present as the tuples while we build machine learning models. These parameters are defined by us. Machine learning algorithms never learn these parameters. These can be tuned in different step.
GridSearchCV is a technique for finding the optimal hyperparameter values from a given set of parameters in a grid. It's essentially a cross-validation technique. The model as well as the parameters must be entered. After extracting the best parameter values, predictions are made.
The "best" parameters that GridSearchCV identifies are technically the best that could be produced, but only by the parameters that you included in your parameter grid.
Tuned Ridge Regression
from sklearn.preprocessing import PolynomialFeatures
# Polynomial features are those features created by raising existing features to an exponent.
# For example, if a dataset had one input feature X,
# then a polynomial feature would be the addition of a new feature (column) where values were calculated by squaring the values in X, e.g. X^2.
steps = [
('poly', PolynomialFeatures(degree=2)),
('model', Ridge(alpha=3.8, fit_intercept=True))
]
ridge_pipe = Pipeline(steps)
ridge_pipe.fit(X_train, y_train)
# Model making a prediction on test data
y_pred = ridge_pipe.predict(X_test)
from sklearn.model_selection import GridSearchCV
alpha_params = {'model__alpha': list(range(1, 15))}
clf = GridSearchCV(ridge_pipe, alpha_params, cv = 10)
Tuned Ridge Regression performance for Avocado dataset
# Fit and tune model clf.fit(X_train, y_train) # Model making a prediction on test data y_pred = ridge_pipe.predict(X_test) # The combination of hyperparameters along with values that give the best performance of our estimate specified print(clf.best_params_)
{'model__alpha': 1}
ndf = [Reg_Models_Evaluation_Metrics(clf,X_train,y_train,X_test,y_test,y_pred)] clf_score = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE']) clf_score.insert(0, 'Model', 'Tuned Ridge Regression') clf_score
| Model | R2 Score | Adjusted R2 Score | Cross Validated R2 Score | RMSE | |
|---|---|---|---|---|---|
| 0 | Tuned Ridge Regression | 0.736622 | 0.733212 | 0.739008 | 0.210438 |
Tuned Ridge Regression performance for Boston dataset
steps = [
('poly', PolynomialFeatures(degree=2)),
('model', Ridge(alpha=3.8, fit_intercept=True))
]
ridge_pipe = Pipeline(steps)
ridge_pipe.fit(X_train2, y_train2)
# Model making a prediction on test data
y_pred = ridge_pipe.predict(X_test2)
alpha_params = {'model__alpha': list(range(1, 15))}
clf = GridSearchCV(ridge_pipe, alpha_params, cv = 10)
# Fit and tune model
clf.fit(X_train2, y_train2)
# Model making a prediction on test data
y_pred = ridge_pipe.predict(X_test2)
# The combination of hyperparameters along with values that give the best performance of our estimate specified
print(clf.best_params_)
{'model__alpha': 12}
ndf = [Reg_Models_Evaluation_Metrics(clf,X_train2,y_train2,X_test2,y_test2,y_pred)] clf_score2 = pd.DataFrame(data = ndf, columns=['R2 Score','Adjusted R2 Score','Cross Validated R2 Score','RMSE']) clf_score2.insert(0, 'Model', 'Tuned Ridge Regression') clf_score2
| Model | R2 Score | Adjusted R2 Score | Cross Validated R2 Score | RMSE | |
|---|---|---|---|---|---|
| 0 | Tuned Ridge Regression | 0.793267 | 0.773792 | 0.844628 | 3.965999 |