Regression Evaluation Metrics
All of these are loss functions, because we want to minimize them.
Mean Absolute Error (MAE)
is the mean of the absolute value of the errors:
\(\frac 1n\sum_{i=1}^n|y_i-\hat{y}_i|\)
The Mean absolute error represents the average of the absolute difference between the actual and predicted values in the dataset. It measures the average of the residuals in the dataset.
Mean Squared Error (MSE)
is the mean of the squared errors:
\(\frac 1n\sum_{i=1}^n(y_i-\hat{y}_i)^2\)
Mean Squared Error represents the average of the squared difference between the original and predicted values in the data set. It measures the variance of the residuals.
Root Mean Squared Error (RMSE)
is the square root of the mean of the squared errors:
\(\sqrt{\frac 1n\sum_{i=1}^n(y_i-\hat{y}_i)^2}\)
RMSE measures the standard deviation of residuals.
R-squared (Coefficient of determination)
- SST (or TSS)
The Sum of Squares Total/Total Sum of Squares (SST or TSS) is the squared differences between the observed dependent variable and its mean.
- SSR (or RSS)
Sum of Squares Regression (SSR or RSS) is the sum of the differences between the predicted value and the mean of the dependent variable.
- SSE (or ESS)
Sum of Squares Error (SSE or ESS) is the difference between the observed value and the predicted value.

The coefficient of determination or R-squared represents the proportion of the variance in the dependent variable which is explained by the linear regression model. When R² is high, it represents that the regression can capture much of variation in observed dependent variables. That’s why we can say the regression model performs well when R² is high.
\(R^2 = 1- \frac {SSR}{SST}\)
It is a scale-free score i.e. irrespective of the values being small or large, the value of R square will be less than one. One misconception about regression analysis is that a low R-squared value is always a bad thing. For example, some data sets or fields of study have an inherently greater amount of unexplained variation. In this case, R-squared values are naturally going to be lower. Investigators can make useful conclusions about the data even with a low R-squared value.
\(R^2 = 1\)
All the variation in the y values is accounted for by the x values.

\(R^2=0.83\)
83 % of the variation in the y values is accounted for by the x values.

\(R^2=0\)
None of the variation in the y values is accounted for by the x values.

Adjusted R squared
\(R^2_{adj.} = 1 - (1-R^2)*\frac{n-1}{n-p-1}\)
Adjusted R squared is a modified version of R square, and it is adjusted for the number of independent variables in the model, and it will always be less than or equal to R².In the formula below n is the number of observations in the data and k is the number of the independent variables in the data.
Cross-validated R2
Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample. It is a popular method because it is simple to understand and because it generally results in a less biased or less optimistic estimate of the model skill than other methods, such as a simple train/test split.
The procedure has a single parameter called k that refers to the number of groups that a given data sample is to be split into. As such, the procedure is often called k-fold cross-validation. When a specific value for k is chosen, it may be used in place of k in the reference to the model, such as k=10 becoming 10-fold cross-validation.
Cross-validated R2 is a median value or R2 taken from cross-validation procedure.

Regression Evaluation Metrics - Conclusion
Both RMSE and R-Squared quantifies how well a linear regression model fits a dataset. When assessing how well a model fits a dataset, it’s useful to calculate both the RMSE and the R2 value because each metric tells us something different.
-
RMSE tells us the typical distance between the predicted value made by the regression model and the actual value.
-
R2 tells us how well the predictor variables can explain the variation in the response variable.
Adding more independent variables or predictors to a regression model tends to increase the R2 value, which tempts makers of the model to add even more variables. Adjusted R2 is used to determine how reliable the correlation is and how much it is determined by the addition of independent variables. It is always lower than the R2.