Coefficient of Determination (R²)
When evaluating the performance of a regression model, the coefficient of determination (R², R-squared)
is one of the key metrics, along with Mean Squared Error (MSE) and Mean Absolute Error (MAE).
The coefficient of determination indicates how well a model explains the actual data, and it is used to assess the explained variance
.
The coefficient of determination uses somewhat complex equations. If you find it difficult to grasp the formula, it’s okay to just focus on the main idea.
How to Calculate the Coefficient of Determination (R²)
The coefficient of determination is calculated using the following formula:
Each term in this formula is defined as follows:
-
: Residual Sum of Squares (RSS) = the sum of squared differences between the predicted values and the actual values.
-
: Total Sum of Squares (TSS) = the sum of squared differences between the actual values and the mean of the actual values.
The Residual Sum of Squares and Total Sum of Squares are defined as:
Where is the actual value, is the predicted value by the model, and is the mean of the actual values.
Here's an example of how to calculate the coefficient of determination:
Actual Values: [10, 20, 30]
Predicted Values: [15, 25, 35]
SS_{res} = (10-15)² + (20-25)² + (30-35)²
= 25 + 25 + 25
= 75
Mean Value: 20
SS_{tot} = (10-20)² + (20-20)² + (30-20)²
= 100 + 0 + 100
= 200
R² = 1 - (75 / 200)
= 1 - 0.375
= 0.625
How Should R² Be Interpreted?
The coefficient of determination (R²) ranges from 0 to 1.
-
: The model explains the data perfectly.
-
: The model has no explanatory power, equivalent to predicting only the mean of the values.
-
: The model is worse than a random guess, incorrectly describing the data.
Thus, an R² value closer to 1 indicates that the model better explains the data.
Conversely, if the R² value is near 0, the model lacks the ability to describe the data effectively, and a negative value means that the model is performing worse than random predictions.
Model A's R² = 0.85 → The model can explain 85% of the data.
Model B's R² = 0.45 → The model can explain only 45% of the data.
In the next lesson, we will tackle a simple quiz to review the regression model evaluation metrics covered so far.
Want to learn more?
Join CodeFriends Plus membership or enroll in a course to start your journey.