Mathematical Foundations

Mathematical formulations behind Indexly’s statistical inference engine.

Correlation

Pearson Correlation

$$ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})} {\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}} $$

Fisher Z Confidence Interval

$$ z = \tanh^{-1}(r) \newline SE = \frac{1}{\sqrt{n-3}} \newline $$ $$z_{CI} = z \pm z_{\alpha/2} \cdot SE, \quad CI = \tanh(z_{CI})$$

Used for all Pearson correlation CIs.


T-Test (Independent)

$$ t = \frac{\bar{x}_1 - \bar{x}_2} {\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} $$

Degrees of freedom follow Welch correction when variances differ.


Paired T-Test

$$ t = \frac{\bar{d}}{s_d / \sqrt{n}} $$

where $d$ is the difference vector.


ANOVA

$$ F = \frac{MS_{between}}{MS_{within}} $$

Where:

$$ MS = \frac{SS}{df} $$

Post-hoc uses Tukey HSD.

Welch ANOVA

When group normality is acceptable but variances are unequal, Indexly routes to Welch ANOVA with --auto-route. Welch ANOVA uses group-specific variances and adjusted denominator degrees of freedom rather than assuming a common pooled variance.


Mann–Whitney U

Ranks pooled samples and evaluates difference in rank sums.


Kruskal–Wallis

Nonparametric alternative to ANOVA based on ranked data.


Confidence Interval (Mean)

$$ \bar{x} \pm t_{\alpha/2, df} \cdot \frac{s}{\sqrt{n}} $$


Confidence Interval (Proportion)

$$ \hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} $$


Mean Difference CI

$$ (\bar{x}_1 - \bar{x}2) \pm t{\alpha/2} \cdot SE $$


Regression (OLS)

$$ \hat{\beta} = (X^TX)^{-1}X^Ty $$

Standard errors derived from residual variance. When residual diagnostics indicate heteroscedasticity or non-normality and auto-routing is enabled, Indexly reports HC3 robust covariance results and recomputes coefficient confidence intervals from the robust model.


Mixed Effects

$$ y = X\beta + Z\gamma + \epsilon $$

Where:

  • $X\beta$ fixed effects
  • $Z\gamma$ random effects

Bootstrap

Resampling with replacement:

$$ \hat{\theta}^* = f(X^*) $$

CI derived from empirical percentiles.


Bayesian Independent T-Test

Indexly reports BF10, the Bayes factor for the alternative over the null, using the JZS Cauchy-prior t-test formulation.

Interpretation:

  • BF10 < 1 → evidence favors the null
  • 1 ≤ BF10 < 3 → anecdotal evidence for the alternative
  • 3 ≤ BF10 < 10 → moderate evidence for the alternative
  • BF10 ≥ 10 → strong evidence for the alternative

Statistical Power

OLS Regression

Indexly uses Cohen’s (f^2):

$$ f^2 = \frac{R^2}{1 - R^2} $$

Power is computed for the model F-test with numerator degrees of freedom equal to the number of model predictors and denominator degrees of freedom:

$$ df_2 = n - k - 1 $$

ANOVA

ANOVA effect size is reported as eta-squared:

$$ \eta^2 = \frac{SS_{between}}{SS_{total}} $$

Power uses Cohen’s (f), converted from eta-squared:

$$ f = \sqrt{\frac{\eta^2}{1 - \eta^2}} $$


Kruskal–Wallis (Effect Size)

The Kruskal–Wallis test evaluates whether rank distributions differ across groups.
When the test is significant, an effect size should be reported to assess practical significance.


Epsilon-Squared (ε²)

$$ \varepsilon^2 = \frac{H - k + 1}{n - k} $$

Where:

  • $H$ = Kruskal–Wallis statistic
  • $k$ = number of groups
  • $n$ = total sample size

Interpretation Guidelines

  • 0.01 → Small
  • 0.06 → Medium
  • 0.14 → Large

Eta-Squared for Kruskal–Wallis (η²ₕ)

$$ \eta^2_H = \frac{H - k + 1}{n - 1} $$

Where:

  • $H$ = Kruskal–Wallis statistic
  • $k$ = number of groups
  • $n$ = total sample size

Example Calculation

H = 170.014
k = 7
n = 216000

epsilon_sq = (H - k + 1) / (n - k)
eta_sq_h = (H - k + 1) / (n - 1)

print("epsilon^2 =", epsilon_sq)
print("eta^2_H =", eta_sq_h)

Result:

$\varepsilon^2 \approx 0.00076 \quad$ $\eta^2_H \approx 0.00076$


Practical Interpretation

Although the Kruskal–Wallis test may be statistically significant (p < 0.0001), an ε² ≈ 0.00076 indicates a negligible practical effect.

Large sample sizes can produce statistical significance even when the real-world effect is extremely small.


Next

See Developer API for programmatic usage of the Indexly inference engine.