Skip to content

Hypothesis Testing


multi_tax_test

Per-taxon differential abundance test between two groups. Applies a statistical test to each taxon independently, then corrects for multiple comparisons:

from pyloseq import multi_tax_test

results = multi_tax_test(ps, grouping_var="SampleType", test="t", method="BH")
print(results.head(10))

The function requires sample_data with a column that has exactly two distinct non-NaN values. Samples with NaN in the grouping column are dropped silently.

Test statistics

test Method
"t" Welch's t-test (equal_var=False)
"wilcoxon" Wilcoxon rank-sum test

Welch's t-test is appropriate when group variances may differ and both groups have at least a few samples. The Wilcoxon rank-sum test is a non-parametric alternative; use it when count distributions are highly skewed or sample sizes are very small.

Multiple-testing correction

method Type Description
"BH" FDR Benjamini-Hochberg (default). Controls false discovery rate.
"BY" FDR Benjamini-Yekutieli. More conservative than BH; valid under arbitrary correlation.
"holm" FWER Holm step-down. Controls family-wise error rate without assuming independence.
"bonferroni" FWER Bonferroni. Most conservative; appropriate when any false positive is unacceptable.
"westfall_young" FWER Permutation-based step-down (Westfall & Young 1993). Equivalent to R's multtest::mt.minP. Respects the correlation structure of the test statistics.

Return value

A DataFrame with one row per taxon, sorted by ascending adjp:

Column Description
statistic Per-taxon test statistic
rawp Uncorrected p-value
adjp Corrected p-value
mean_<group1> Mean abundance in group 1
mean_<group2> Mean abundance in group 2

Group names in the mean columns come from the sorted unique values of grouping_var.

Examples

# Default: Welch t-test, BH correction
results = multi_tax_test(ps, "SampleType")
significant = results[results["adjp"] < 0.05]

# Wilcoxon with Holm FWER control
results = multi_tax_test(ps, "SampleType", test="wilcoxon", method="holm")

# Permutation FWER — use more permutations for stable estimates
results = multi_tax_test(
    ps, "SampleType",
    method="westfall_young",
    n_permutations=5000,
    rng_seed=0,
)

Note

westfall_young runs n_permutations separate tests per permutation and scales as O(n_taxa × n_permutations). For datasets with tens of thousands of taxa, use a smaller n_permutations (e.g. 500–1000) for exploration and increase it only for final analysis.

pyloseq.multi_tax_test

multi_tax_test(
    ps: Phyloseq,
    grouping_var: str,
    test: Literal["t", "wilcoxon"] = "t",
    method: Literal[
        "BH", "BY", "holm", "bonferroni", "westfall_young"
    ] = "BH",
    alternative: Literal[
        "two-sided", "greater", "less"
    ] = "two-sided",
    n_permutations: int = 1000,
    rng_seed: int | None = 42,
) -> pd.DataFrame

Test each taxon for differential abundance between two groups.

R reference: phyloseq::mt()

Parameters:

Name Type Description Default
ps Phyloseq

Phyloseq object (must have sample_data).

required
grouping_var str

Column in sample_data defining the two groups to compare. Samples with NaN in this column are silently dropped.

required
test Literal['t', 'wilcoxon']

Per-taxon test statistic. "t" uses Welch's t-test (equal_var=False); "wilcoxon" uses the Wilcoxon rank-sum test.

't'
method Literal['BH', 'BY', 'holm', 'bonferroni', 'westfall_young']

Multiple-testing correction method:

  • "BH" — Benjamini-Hochberg FDR (default)
  • "BY" — Benjamini-Yekutieli FDR
  • "holm" — Holm step-down FWER
  • "bonferroni" — Bonferroni FWER
  • "westfall_young" — permutation-based step-down FWER (R multtest::mt.minP)
'BH'
alternative Literal['two-sided', 'greater', 'less']

Direction of the alternative hypothesis.

'two-sided'
n_permutations int

Number of label permutations for method="westfall_young".

1000
rng_seed int | None

Seed for the permutation RNG ("westfall_young" only). Pass None for non-reproducible draws.

42

Returns:

Type Description
DataFrame

One row per taxon, sorted by ascending adjp. Columns:

  • statistic — per-taxon test statistic
  • rawp — uncorrected p-value
  • adjp — corrected p-value (using method)
  • mean_<g1> — mean abundance in group 1
  • mean_<g2> — mean abundance in group 2

Raises:

Type Description
pyloseqValidationError

If sample_data is missing, grouping_var is not found, the variable does not have exactly 2 distinct non-NaN levels, or either group has fewer than 2 samples.