Sst ssr sse

[stata] 회귀분석 결과에서 SST, SSE, SSR, R-squared의 의미

총 변동 Total SS : total sum of squares (SST) : : 개별 y의 편차제곱의 합 설명된 변동 Model SS :explained sum of squares (SSE) : : 회귀식 추정 y의 편차제곱의 합 설명 안된 변동 Residual SS : residual..

igija.tistory.com

(1) Intuition for why $SST = SSR + SSE$

When we try to explain the total variation in Y ($SST$) with one explanatory variable, X, then there are exactly two sources of variability. First, there is the variability captured by X (Sum Square Regression), and second, there is the variability not captured by X (Sum Square Error). Hence, $SST = SSR + SSE$ (exact equality).

(2) Geometric intuition

Please see the first few pictures here (especially the third): https://sites.google.com/site/modernprogramevaluation/variance-and-bias

Some of the total variation in the data (distance from datapoint to $\bar{Y}$) is captured by the regression line (the distance from the regression line to $\bar{Y}$) and error (distance from the point to the regression line). There's not room left for $SST$ to be greater than $SSE + SSR$.

(3) The problem with your illustration

You can't look at SSE and SSR in a pointwise fashion. For a particular point, the residual may be large, so that there is more error than explanatory power from X. However, for other points, the residual will be small, so that the regression line explains a lot of the variability. They will balance out and ultimately $SST = SSR + SSE$. Of course this is not rigorous, but you can find proofs like the above.

Also notice that regression will not be defined for one point: $b_1 = \frac{\sum(X_i -\bar{X})(Y_i-\bar{Y}) }{\sum (X_i -\bar{X})^2}$, and you can see that the denominator will be zero, making estimation undefined.

Hope this helps.

--Ryan M.


Linear regression is used to find a line that best “fits” a dataset.

We often use three different sum of squares values to measure how well the regression line actually fits the data:

1. Sum of Squares Total (SST) – The sum of squared differences between individual data points (yi) and the mean of the response variable (y).

  • SST = Σ(yi – y)2

2. Sum of Squares Regression (SSR) – The sum of squared differences between predicted data points (ŷi) and the mean of the response variable(y).

  • SSR = Σ(ŷi – y)2

3. Sum of Squares Error (SSE) – The sum of squared differences between predicted data points (ŷi) and observed data points (yi).

  • SSE = Σ(ŷi – yi)2

The following relationship exists between these three measures:

SST = SSR + SSE

Thus, if we know two of these measures then we can use some simple algebra to calculate the third.

SSR, SST & R-Squared

R-squared, sometimes referred to as the coefficient of determination, is a measure of how well a linear regression model fits a dataset. It represents the proportion of the variance in the response variable that can be explained by the predictor variable.

The value for R-squared can range from 0 to 1. A value of 0 indicates that the response variable cannot be explained by the predictor variable at all. A value of 1 indicates that the response variable can be perfectly explained without error by the predictor variable.

Using SSR and SST, we can calculate R-squared as:

R-squared = SSR / SST

For example, if the SSR for a given regression model is 137.5 and SST is 156 then we would calculate R-squared as:

R-squared = 137.5 / 156 = 0.8814

This tells us that 88.14% of the variation in the response variable can be explained by the predictor variable.

Calculate SST, SSR, SSE: Step-by-Step Example

Suppose we have the following dataset that shows the number of hours studied by six different students along with their final exam scores:

Sst ssr sse

Using some statistical software (like R, Excel, Python) or even by hand, we can find that the line of best fit is:

Score = 66.615 + 5.0769*(Hours)

Sst ssr sse

Once we know the line of best fit equation, we can use the following steps to calculate SST, SSR, and SSE:

Step 1: Calculate the mean of the response variable.

The mean of the response variable (y) turns out to be 81.

Sst ssr sse

Step 2: Calculate the predicted value for each observation.

Next, we can use the line of best fit equation to calculate the predicted exam score () for each student.

For example, the predicted exam score for the student who studied one hours is:

Score = 66.615 + 5.0769*(1) = 71.69.

We can use the same approach to find the predicted score for each student:

Sst ssr sse

Step 3: Calculate the sum of squares total (SST).

Next, we can calculate the sum of squares total.

For example, the sum of squares total for the first student is:

(yi – y)2 = (68 – 81)2 = 169.

We can use the same approach to find the sum of squares total for each student:

Sst ssr sse

The sum of squares total turns out to be 316.

Step 4: Calculate the sum of squares regression (SSR).

Next, we can calculate the sum of squares regression.

For example, the sum of squares regression for the first student is:

(ŷi – y)2 = (71.69 – 81)2 = 86.64.

We can use the same approach to find the sum of squares regression for each student:

Sst ssr sse

The sum of squares regression turns out to be 279.23.

Step 5: Calculate the sum of squares error (SSE).

Next, we can calculate the sum of squares error.

For example, the sum of squares error for the first student is:

(ŷi – yi)2 = (71.69 – 68)2 = 13.63.

We can use the same approach to find the sum of squares error for each student:

Sst ssr sse

We can verify that SST = SSR + SSE

  • SST = SSR + SSE
  • 316 = 279.23 + 36.77

We can also calculate the R-squared of the regression model by using the following equation:

  • R-squared = SSR / SST
  • R-squared = 279.23 / 316
  • R-squared = 0.8836

This tells us that 88.36% of the variation in exam scores can be explained by the number of hours studied.

Additional Resources

You can use the following calculators to automatically calculate SST, SSR, and SSE for any simple linear regression line:

SST Calculator
SSR Calculator
SSE Calculator

What is SSR SST and SSE in regression?

Calculation of sum of squares of total (SST), sum of squares due to regression (SSR), sum of squares of errors (SSE), and R-square, which is the proportion of explained variability (SSR) among total variability (SST)

What is SST SSR SSE in statistics?

SST, SSR, SSE: Definition and Formulas. There are three terms we must define. The sum of squares total, the sum of squares regression, and the sum of squares error.

How do you find SSE with SSR and SST?

SSR = Σ( – y)2 = SST – SSE. Regression sum of squares is interpreted as the amount of total variation that is explained by the model. r2 = 1 – SSE/SST = (SST – SSE)/SST = SSR/SST the ratio of explained variation to total variation.

How do you find SST r2 and SSE?

R2 = 1 - SSE / SST. in the usual ANOVA notation. ... .
R2adj = 1 - MSE / MST. since this emphasizes its natural relationship to the coefficient of determination. ... .
R-squared = SS(Between Groups)/SS(Total) The Greek symbol "Eta-squared" is sometimes used to denote this quantity. ... .
R-squared = 1 - SS(Error)/SS(Total) ... .
Eta-squared =.