March 28, 20264 min read

Correlation Calculator — Pearson r Coefficient & Interpretation

Calculate Pearson correlation coefficient (r) for any two variables. Understand the -1 to +1 scale, interpret strength and direction, and avoid the causation trap.

correlation Pearson r statistics data analysis calchub

Correlation answers a specific question: do two variables tend to move together? When ice cream sales go up, drowning incidents also rise. Before you conclude ice cream kills people, the correlation coefficient gives you a number to work with — and the context to interpret it. Use the CalcHub Correlation Calculator to calculate Pearson r for your data.

The Pearson r Formula

$$r = \frac{n\Sigma x_i y_i - \Sigma x_i \cdot \Sigma y_i}{\sqrt{[n\Sigma x_i^2 - (\Sigma x_i)^2][n\Sigma y_i^2 - (\Sigma y_i)^2]}}$$

r always falls between -1 and +1.

Equivalent form using standard deviations: $$r = \frac{\Sigma(x_i - \bar{x})(y_i - \bar{y})}{(n-1) \cdot s_x \cdot s_y}$$

Worked Example

Does temperature (°C) affect café sales (units/day)?

Temp (x)	Sales (y)
20	40
25	55
30	70
15	30
35	80

Calculating with n=5:

Σx = 125, Σy = 275
Σx² = 3275, Σy² = 16325
Σxy = 7325

r = (5×7325 − 125×275) / √[(5×3275 − 125²)(5×16325 − 275²)] = (36625 − 34375) / √[(16375 − 15625)(81625 − 75625)] = 2250 / √[750 × 6000] = 2250 / √4,500,000 = 2250 / 2121.3 = 0.97

Strong positive correlation — warmer days, more sales.

Interpreting the r Value

r Range	Interpretation
0.90 to 1.00	Very strong positive
0.70 to 0.89	Strong positive
0.50 to 0.69	Moderate positive
0.30 to 0.49	Weak positive
0.10 to 0.29	Very weak positive
−0.10 to 0.10	Essentially none
−0.29 to −0.10	Very weak negative
−0.49 to −0.30	Weak negative
−0.69 to −0.50	Moderate negative
−0.89 to −0.70	Strong negative
−1.00 to −0.90	Very strong negative

Note: these thresholds aren't universal laws — in psychology, r = 0.3 might be noteworthy; in physics, r = 0.95 might be disappointing.

Correlation ≠ Causation

This cannot be overstated. Spurious correlations are everywhere:

Per capita cheese consumption correlates with deaths by bedsheet tangling (r ≈ 0.95)
Nicolas Cage film releases per year correlates with drowning in swimming pools
Ice cream sales and drowning both rise in summer — the hidden variable is temperature

The correlation coefficient only measures the linear relationship between two variables. It says nothing about whether one causes the other, whether both are caused by a third variable, or whether the pattern is just coincidence.

Other Types of Correlation

Type	Use Case
Pearson r	Continuous, normally distributed data
Spearman ρ	Ranked/ordinal data, or non-linear relationships
Kendall τ	Small samples, ordinal data
Point-Biserial	One continuous, one binary variable

Spearman's is simply Pearson r applied to ranks rather than raw values — it's more robust to outliers.

R² and Explained Variance

Squaring the correlation gives R² — the proportion of variance in y explained by x. If r = 0.7, then R² = 0.49: the x variable explains 49% of the variation in y, leaving 51% unexplained.

What sample size do I need for reliable correlation?

As a rough guide: n ≥ 30 for a stable estimate. With n < 10, a spurious high correlation is quite likely by chance. You can test statistical significance using the t-distribution: t = r√(n-2) / √(1-r²), with n-2 degrees of freedom.

Can correlation be used with categorical data?

Not directly with Pearson r, which requires numeric data. For two categorical variables, use Cramér's V. For one categorical and one numeric, use point-biserial correlation or ANOVA.

Pearson r only detects linear relationships. If your data follows a U-shape or any curved pattern, r could be close to zero while the association is actually very strong. Always plot your data before trusting any correlation figure.