March 28, 20264 min read

Correlation Calculator — Pearson r Coefficient & Interpretation

Calculate Pearson correlation coefficient (r) for any two variables. Understand the -1 to +1 scale, interpret strength and direction, and avoid the causation trap.

correlation Pearson r statistics data analysis calchub
Ad 336x280

Correlation answers a specific question: do two variables tend to move together? When ice cream sales go up, drowning incidents also rise. Before you conclude ice cream kills people, the correlation coefficient gives you a number to work with — and the context to interpret it. Use the CalcHub Correlation Calculator to calculate Pearson r for your data.

The Pearson r Formula

$$r = \frac{n\Sigma x_i y_i - \Sigma x_i \cdot \Sigma y_i}{\sqrt{[n\Sigma x_i^2 - (\Sigma x_i)^2][n\Sigma y_i^2 - (\Sigma y_i)^2]}}$$

r always falls between -1 and +1.

Equivalent form using standard deviations: $$r = \frac{\Sigma(x_i - \bar{x})(y_i - \bar{y})}{(n-1) \cdot s_x \cdot s_y}$$

Worked Example

Does temperature (°C) affect café sales (units/day)?

Temp (x)Sales (y)
2040
2555
3070
1530
3580
Calculating with n=5:
  • Σx = 125, Σy = 275
  • Σx² = 3275, Σy² = 16325
  • Σxy = 7325
r = (5×7325 − 125×275) / √[(5×3275 − 125²)(5×16325 − 275²)] = (36625 − 34375) / √[(16375 − 15625)(81625 − 75625)] = 2250 / √[750 × 6000] = 2250 / √4,500,000 = 2250 / 2121.3 = 0.97

Strong positive correlation — warmer days, more sales.

Interpreting the r Value

r RangeInterpretation
0.90 to 1.00Very strong positive
0.70 to 0.89Strong positive
0.50 to 0.69Moderate positive
0.30 to 0.49Weak positive
0.10 to 0.29Very weak positive
−0.10 to 0.10Essentially none
−0.29 to −0.10Very weak negative
−0.49 to −0.30Weak negative
−0.69 to −0.50Moderate negative
−0.89 to −0.70Strong negative
−1.00 to −0.90Very strong negative
Note: these thresholds aren't universal laws — in psychology, r = 0.3 might be noteworthy; in physics, r = 0.95 might be disappointing.

Correlation ≠ Causation

This cannot be overstated. Spurious correlations are everywhere:

  • Per capita cheese consumption correlates with deaths by bedsheet tangling (r ≈ 0.95)
  • Nicolas Cage film releases per year correlates with drowning in swimming pools
  • Ice cream sales and drowning both rise in summer — the hidden variable is temperature
The correlation coefficient only measures the linear relationship between two variables. It says nothing about whether one causes the other, whether both are caused by a third variable, or whether the pattern is just coincidence.

Other Types of Correlation

TypeUse Case
Pearson rContinuous, normally distributed data
Spearman ρRanked/ordinal data, or non-linear relationships
Kendall τSmall samples, ordinal data
Point-BiserialOne continuous, one binary variable
Spearman's is simply Pearson r applied to ranks rather than raw values — it's more robust to outliers.

R² and Explained Variance

Squaring the correlation gives R² — the proportion of variance in y explained by x. If r = 0.7, then R² = 0.49: the x variable explains 49% of the variation in y, leaving 51% unexplained.

What sample size do I need for reliable correlation?

As a rough guide: n ≥ 30 for a stable estimate. With n < 10, a spurious high correlation is quite likely by chance. You can test statistical significance using the t-distribution: t = r√(n-2) / √(1-r²), with n-2 degrees of freedom.

Can correlation be used with categorical data?

Not directly with Pearson r, which requires numeric data. For two categorical variables, use Cramér's V. For one categorical and one numeric, use point-biserial correlation or ANOVA.

Pearson r only detects linear relationships. If your data follows a U-shape or any curved pattern, r could be close to zero while the association is actually very strong. Always plot your data before trusting any correlation figure.

Ad 728x90