Mathematical Foundations

Mathematical formulations behind Indexly’s statistical inference engine.

Correlation

Pearson Correlation

$$ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})} {\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}} $$

Fisher Z Confidence Interval

$$ z = \tanh^{-1}(r) \newline SE = \frac{1}{\sqrt{n-3}} \newline $$ $$z_{CI} = z \pm z_{\alpha/2} \cdot SE, \quad CI = \tanh(z_{CI})$$

Used for all Pearson correlation CIs.

T-Test (Independent)

$$ t = \frac{\bar{x}_1 - \bar{x}_2} {\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} $$

Degrees of freedom follow Welch correction when variances differ.

Paired T-Test

$$ t = \frac{\bar{d}}{s_d / \sqrt{n}} $$

where $d$ is the difference vector.

ANOVA

$$ F = \frac{MS_{between}}{MS_{within}} $$

Where:

$$ MS = \frac{SS}{df} $$

Post-hoc uses Tukey HSD.

Mann–Whitney U

Ranks pooled samples and evaluates difference in rank sums.

Kruskal–Wallis

Nonparametric alternative to ANOVA based on ranked data.

Confidence Interval (Mean)

$$ \bar{x} \pm t_{\alpha/2, df} \cdot \frac{s}{\sqrt{n}} $$

Confidence Interval (Proportion)

$$ \hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} $$

Mean Difference CI

$$ (\bar{x}_1 - \bar{x}2) \pm t{\alpha/2} \cdot SE $$

Regression (OLS)

$$ \hat{\beta} = (X^TX)^{-1}X^Ty $$

Standard errors derived from residual variance.

Mixed Effects

$$ y = X\beta + Z\gamma + \epsilon $$

Where:

$X\beta$ fixed effects
$Z\gamma$ random effects

Bootstrap

Resampling with replacement:

$$ \hat{\theta}^* = f(X^*) $$

CI derived from empirical percentiles.

Kruskal–Wallis (Effect Size)

The Kruskal–Wallis test evaluates whether rank distributions differ across groups.
When the test is significant, an effect size should be reported to assess practical significance.

Epsilon-Squared (ε²)

$$ \varepsilon^2 = \frac{H - k + 1}{n - k} $$

Where:

$H$ = Kruskal–Wallis statistic
$k$ = number of groups
$n$ = total sample size

Interpretation Guidelines

0.01 → Small
0.06 → Medium
0.14 → Large

Eta-Squared for Kruskal–Wallis (η²ₕ)

$$ \eta^2_H = \frac{H - k + 1}{n - 1} $$

Where:

$H$ = Kruskal–Wallis statistic
$k$ = number of groups
$n$ = total sample size

Example Calculation

H = 170.014
k = 7
n = 216000

epsilon_sq = (H - k + 1) / (n - k)
eta_sq_h = (H - k + 1) / (n - 1)

print("epsilon^2 =", epsilon_sq)
print("eta^2_H =", eta_sq_h)

Result:

$\varepsilon^2 \approx 0.00076 \quad$ $\eta^2_H \approx 0.00076$

Practical Interpretation

Although the Kruskal–Wallis test may be statistically significant (p < 0.0001), an ε² ≈ 0.00076 indicates a negligible practical effect.

Large sample sizes can produce statistical significance even when the real-world effect is extremely small.

See Developer API for programmatic usage of the Indexly inference engine.

Mathematical Foundations

Correlation

Pearson Correlation

Fisher Z Confidence Interval

T-Test (Independent)

Paired T-Test

ANOVA

Mann–Whitney U

Kruskal–Wallis

Confidence Interval (Mean)

Confidence Interval (Proportion)

Mean Difference CI

Regression (OLS)

Mixed Effects

Bootstrap

Kruskal–Wallis (Effect Size)

Epsilon-Squared (ε²)

Interpretation Guidelines

Eta-Squared for Kruskal–Wallis (η²ₕ)

Example Calculation

Practical Interpretation

Next