Linear Regression Calculator — Line of Best Fit, Slope & R²
Calculate linear regression with least squares method. Find slope, y-intercept, and R² value. Interpret your line of best fit with worked examples and formulas.
Linear regression finds the straight line that best fits a set of data points — minimizing the total squared distance from every point to the line. It's the workhorse of predictive statistics, from predicting house prices to estimating crop yields. Run yours with the CalcHub Linear Regression Calculator.
The Least Squares Formulas
Given pairs (x₁, y₁), (x₂, y₂), ..., (xₙ, yₙ), the best-fit line is ŷ = mx + b where:
Slope: $$m = \frac{n\Sigma(x_i y_i) - \Sigma x_i \cdot \Sigma y_i}{n\Sigma x_i^2 - (\Sigma x_i)^2}$$ Intercept: $$b = \bar{y} - m\bar{x}$$Where x̄ and ȳ are the means of x and y respectively.
Worked Example
A study tracks hours studied vs. exam score for 5 students:
| Hours (x) | Score (y) | x² | xy |
|---|---|---|---|
| 1 | 50 | 1 | 50 |
| 2 | 60 | 4 | 120 |
| 3 | 65 | 9 | 195 |
| 4 | 75 | 16 | 300 |
| 5 | 85 | 25 | 425 |
| Σ = 15 | Σ = 335 | Σ = 55 | Σ = 1090 |
Prediction: a student studying 6 hours → ŷ = 8.5(6) + 41.5 = 92.5
Interpreting R² (Coefficient of Determination)
R² measures how much of the variation in y is explained by the linear relationship with x.
Formula: $$R^2 = 1 - \frac{SS_{res}}{SS_{tot}}$$Where SS_res = Σ(yᵢ − ŷᵢ)² and SS_tot = Σ(yᵢ − ȳ)²
| R² Value | Interpretation |
|---|---|
| 0.00 – 0.19 | Very weak fit |
| 0.20 – 0.39 | Weak fit |
| 0.40 – 0.59 | Moderate fit |
| 0.60 – 0.79 | Strong fit |
| 0.80 – 1.00 | Very strong fit |
Residuals: The Gaps Between Reality and the Line
A residual is the difference between an observed y value and the predicted ŷ value:
eᵢ = yᵢ − ŷᵢFor the student who studied 3 hours with a score of 65:
Predicted: ŷ = 8.5(3) + 41.5 = 67
Residual: 65 − 67 = −2
Plotting residuals helps spot patterns — if they're random around zero, the linear model is appropriate. If they curve, you might need polynomial regression.
Assumptions and Limitations
Linear regression assumes:
- A roughly linear relationship between x and y
- Residuals are normally distributed with constant variance
- Observations are independent
It breaks down when the true relationship is curved, when outliers are extreme, or when you're extrapolating far beyond your data range.
What does a negative slope mean?
A negative slope (m < 0) means y decreases as x increases — they're inversely related. For example, more hours of TV watched might correlate with fewer hours studying.
Is linear regression the same as correlation?
Related but different. Correlation (Pearson r) measures the strength and direction of the linear relationship. Regression goes further — it gives you an actual predictive equation. R² equals r² (Pearson r squared), which is why a strong correlation gives a high R².
Can I use linear regression with one variable to predict the future?
Yes, with caution. Extrapolating beyond your data range is risky — the linear trend might not hold. Always be skeptical of predictions far outside the range of x values used to build the model.