Mathematical Foundations
Correlation
Pearson Correlation
$$ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})} {\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}} $$
Fisher Z Confidence Interval
$$ z = \tanh^{-1}(r) \newline SE = \frac{1}{\sqrt{n-3}} \newline $$ $$z_{CI} = z \pm z_{\alpha/2} \cdot SE, \quad CI = \tanh(z_{CI})$$
Used for all Pearson correlation CIs.
T-Test (Independent)
$$ t = \frac{\bar{x}_1 - \bar{x}_2} {\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} $$
Degrees of freedom follow Welch correction when variances differ.
Paired T-Test
$$ t = \frac{\bar{d}}{s_d / \sqrt{n}} $$
where $d$ is the difference vector.
ANOVA
$$ F = \frac{MS_{between}}{MS_{within}} $$
Where:
$$ MS = \frac{SS}{df} $$
Post-hoc uses Tukey HSD.
Mann–Whitney U
Ranks pooled samples and evaluates difference in rank sums.
Kruskal–Wallis
Nonparametric alternative to ANOVA based on ranked data.
Confidence Interval (Mean)
$$ \bar{x} \pm t_{\alpha/2, df} \cdot \frac{s}{\sqrt{n}} $$
Confidence Interval (Proportion)
$$ \hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} $$
Mean Difference CI
$$ (\bar{x}_1 - \bar{x}2) \pm t{\alpha/2} \cdot SE $$
Regression (OLS)
$$ \hat{\beta} = (X^TX)^{-1}X^Ty $$
Standard errors derived from residual variance.
Mixed Effects
$$ y = X\beta + Z\gamma + \epsilon $$
Where:
- $X\beta$ fixed effects
- $Z\gamma$ random effects
Bootstrap
Resampling with replacement:
$$ \hat{\theta}^* = f(X^*) $$
CI derived from empirical percentiles.
Kruskal–Wallis (Effect Size)
The Kruskal–Wallis test evaluates whether rank distributions differ across groups.
When the test is significant, an effect size should be reported to assess practical significance.
Epsilon-Squared (ε²)
$$ \varepsilon^2 = \frac{H - k + 1}{n - k} $$
Where:
- $H$ = Kruskal–Wallis statistic
- $k$ = number of groups
- $n$ = total sample size
Interpretation Guidelines
- 0.01 → Small
- 0.06 → Medium
- 0.14 → Large
Eta-Squared for Kruskal–Wallis (η²ₕ)
$$ \eta^2_H = \frac{H - k + 1}{n - 1} $$
Where:
- $H$ = Kruskal–Wallis statistic
- $k$ = number of groups
- $n$ = total sample size
Example Calculation
H = 170.014
k = 7
n = 216000
epsilon_sq = (H - k + 1) / (n - k)
eta_sq_h = (H - k + 1) / (n - 1)
print("epsilon^2 =", epsilon_sq)
print("eta^2_H =", eta_sq_h)
Result:
$\varepsilon^2 \approx 0.00076 \quad$ $\eta^2_H \approx 0.00076$
Practical Interpretation
Although the Kruskal–Wallis test may be statistically significant (p < 0.0001), an ε² ≈ 0.00076 indicates a negligible practical effect.
Large sample sizes can produce statistical significance even when the real-world effect is extremely small.
Next
See Developer API for programmatic usage of the Indexly inference engine.