Ordination¶
ordinate¶
Returns an skbio.stats.ordination.OrdinationResults. Access sample coordinates via .samples (a DataFrame with one row per sample), feature scores via .features (not available for all methods), and variance explained per axis via .proportion_explained.
Available methods:
| Method | Type | Distance required | Formula required |
|---|---|---|---|
"PCoA" / "MDS" |
Unconstrained | Yes | No |
"NMDS" |
Unconstrained | Yes | No |
"CA" |
Unconstrained | No | No |
"CCA" |
Constrained | No | Yes |
"RDA" |
Constrained | No | Yes |
"CAP" |
Constrained | Yes | Yes |
"DPCoA" |
Unconstrained | Computed from tree | No |
"DCA" |
— | — | — |
"DCA" is listed for completeness; calling it raises NotImplementedError. Use PCoA or CCA instead.
Unconstrained methods¶
PCoA / MDS
Principal Coordinates Analysis on a distance matrix. The most common choice:
ord_result = ordinate(ps, method="PCoA", distance="bray")
ord_result = ordinate(ps, method="MDS", distance="bray") # alias
NMDS
Non-metric Multidimensional Scaling. Uses scikit-bio's nmds if available, falls back to scikit-learn's non-metric MDS if not. If neither is installed, falls back to classical metric MDS and emits a warning:
For NMDS, proportion_explained is NaN; stress is attached as ord_result.stress.
CA
Correspondence Analysis on the raw count table (no distance matrix). Produces both sample and feature scores:
DPCoA
Double Principal Coordinates Analysis. Integrates phylogenetic distances; requires phy_tree:
Constrained methods¶
Constrained ordinations require a formula string referencing columns in sample_data. Categorical columns are dummy-encoded automatically:
# CCA: one constraining variable
ord_result = ordinate(ps, method="CCA", formula="~SampleType")
# RDA: multiple variables
ord_result = ordinate(ps, method="RDA", formula="~SampleType + Depth")
# CAP: distance-based, constrained
ord_result = ordinate(ps, method="CAP", distance="bray", formula="~SampleType")
Terms in the formula must be column names in sample_data. Unknown terms raise pyloseqValidationError.
Pre-computed distance matrices¶
Pass a skbio.DistanceMatrix directly to avoid recomputing:
from pyloseq import distance, ordinate
dm = distance(ps, "unifrac") # expensive; compute once
pcoa_result = ordinate(ps, method="PCoA", distance=dm)
cap_result = ordinate(ps, method="CAP", distance=dm, formula="~SampleType")
Accessing results¶
ord_result = ordinate(ps, method="PCoA", distance="bray")
# Sample coordinates (first two axes)
coords = ord_result.samples[["PC1", "PC2"]]
# Variance explained
print(ord_result.proportion_explained[:3])
# PC1 0.312
# PC2 0.187
# PC3 0.089
# Feature scores (available for CA, CCA, RDA)
taxa_coords = ord_result.features
pyloseq.ordinate ¶
ordinate(
ps: Phyloseq,
method: str = "PCoA",
distance: str | Any = "bray",
formula: str | None = None,
**kwargs: Any,
) -> OrdinationResults
Perform multivariate ordination on a Phyloseq object.
R reference: ordinate(physeq, method, distance, formula, ...)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ps
|
Phyloseq
|
|
required |
method
|
str
|
Ordination method: |
'PCoA'
|
distance
|
str | Any
|
Distance method string (passed to :func: |
'bray'
|
formula
|
str | None
|
Model formula string referencing columns of |
None
|
**kwargs
|
Any
|
Passed to the underlying ordination function. |
{}
|
Returns:
| Type | Description |
|---|---|
OrdinationResults
|
|