Containers¶
pyloseq represents microbiome data through a small set of typed container classes. Phyloseq is the top-level object that holds the others. All containers are immutable in the sense that manipulation functions never modify them in-place — they always return new objects.
Phyloseq¶
Phyloseq is the central data object. It bundles an OTU table with any combination of sample metadata, taxonomic annotations, a phylogenetic tree, and reference sequences. The constructor validates component consistency and silently prunes to the intersection of taxa and sample names across all attached components.
from pyloseq import Phyloseq, OtuTable, SampleData, TaxTable, PhyTree
ps = Phyloseq(
otu=OtuTable(df, taxa_are_rows=True),
sam=SampleData(metadata_df),
tax=TaxTable(taxonomy_df),
tree=PhyTree.from_newick(newick_str),
)
pyloseq.Phyloseq ¶
Container for microbiome data: OTU table + optional metadata components.
Mirrors R's phyloseq-class. The constructor accepts any subset of
components, runs the validator suite, and silently prunes to the
intersection of names across components (unless strict=True).
By default, pruning during construction emits a warning so the data loss is
discoverable; pass quiet=True to suppress it.
R reference: phyloseq::phyloseq(otu_table, sample_data, tax_table, phy_tree, refseq)
otu_table
property
writable
¶
The OTU/feature abundance table.
R reference: otu_table(x)
phy_tree
property
writable
¶
Phylogenetic tree, or None if not provided.
R reference: phy_tree(x)
rank_names
property
¶
Taxonomic rank names, or [] if no tax table.
R reference: rank_names(x)
refseq
property
writable
¶
Reference sequences, or None if not provided.
R reference: refseq(x)
sample_data
property
writable
¶
Per-sample metadata, or None if not provided.
R reference: sample_data(x)
sample_names
property
¶
Sample identifiers from the OTU table.
R reference: sample_names(x)
sample_variables
property
¶
Names of sample metadata columns, or [] if no sample data.
R reference: sample_variables(x)
tax_table
property
writable
¶
Taxonomic classification table, or None if not provided.
R reference: tax_table(x)
taxa_names
property
¶
Taxon identifiers from the OTU table.
R reference: taxa_names(x)
distance ¶
Compute a pairwise distance matrix.
Thin wrapper around :func:pyloseq.distance — returns a
skbio.stats.distance.DistanceMatrix usable directly with
skbio.stats.distance.permanova / anosim.
R reference: distance(physeq, method)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
str
|
Distance method string (e.g. |
'bray'
|
**kwargs
|
Any
|
Forwarded to the underlying implementation. |
{}
|
get_sample ¶
Return the abundance vector for a single sample across all taxa.
R reference: get_sample(x, i)
get_taxa ¶
Return the abundance vector for a single taxon across all samples.
R reference: get_taxa(x, i)
get_variable ¶
Return a sample metadata column as a Series.
R reference: get_variable(x, v)
melt ¶
Melt to a long-form tidy DataFrame (one row per OTU × Sample pair).
Equivalent to the free function :func:pyloseq.psmelt.
R reference: psmelt(physeq)
ordinate ¶
ordinate(
method: str = "PCoA",
distance: str = "bray",
formula: str | None = None,
**kwargs: Any,
) -> OrdinationResults
Run multivariate ordination.
Thin wrapper around :func:pyloseq.ordinate — returns an
skbio.stats.ordination.OrdinationResults.
R reference: ordinate(physeq, method, distance, formula)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
str
|
Ordination method: |
'PCoA'
|
distance
|
str
|
Distance method or pre-computed |
'bray'
|
formula
|
str | None
|
Model formula for constrained methods (e.g. |
None
|
**kwargs
|
Any
|
Forwarded to the underlying implementation. |
{}
|
sample_sums ¶
Sum of abundances across all taxa for each sample.
R reference: sample_sums(x)
taxa_sums ¶
Sum of abundances across all samples for each taxon.
R reference: taxa_sums(x)
Validation and pruning¶
When taxa or sample names differ between components, the constructor prunes each to the intersection and emits a warning. Pass quiet=True to suppress the warning. Pass strict=True to raise pyloseqValidationError instead of pruning:
# Raises if otu and tax have mismatched taxa names
ps = Phyloseq(otu=otu, tax=tax, strict=True)
# Prune silently
ps = Phyloseq(otu=otu, tax=tax, quiet=True)
Component setters (e.g., ps.tax_table = new_tax) trigger re-validation using the same logic.
OtuTable¶
Stores the feature abundance matrix. Rows can be taxa or samples — track orientation with taxa_are_rows.
import pandas as pd
from pyloseq import OtuTable
df = pd.DataFrame(
{"S1": [10, 0, 5], "S2": [0, 3, 7]},
index=["OTU1", "OTU2", "OTU3"],
)
otu = OtuTable(df, taxa_are_rows=True)
Sparse input (NumPy sparse matrices, scipy CSR/CSC) is accepted. When the matrix density is below 50%, the internal representation is automatically converted to CSR format.
to_dataframe() always returns a DataFrame with taxa as rows, regardless of internal orientation:
Flip orientation without copying data:
pyloseq.OtuTable ¶
Stores an OTU/feature abundance table with orientation tracking.
Internally stores dense data as a pd.DataFrame and sparse data (density
< 50 %) as a scipy.sparse.csr_matrix with separate index/column arrays.
R reference: phyloseq::otu_table(object, taxa_are_rows)
sample_names
property
writable
¶
Sample identifiers.
R reference: sample_names(x)
taxa_are_rows
property
writable
¶
Whether taxa occupy rows (True) or columns (False).
taxa_names
property
writable
¶
Taxa (OTU/ASV) identifiers.
R reference: taxa_names(x)
__init__ ¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
ndarray | DataFrame | spmatrix | list[Any]
|
Abundance matrix. Accepted types: |
required |
taxa_are_rows
|
bool
|
If |
True
|
copy ¶
Return a deep copy.
R reference: otu_table(x) <- otu_table(x) (effectively)
sample_sums ¶
Sum of abundances across all taxa for each sample.
R reference: sample_sums(x)
taxa_sums ¶
Sum of abundances across all samples for each taxon.
R reference: taxa_sums(x)
to_dataframe ¶
Return the abundance matrix as a pd.DataFrame in current orientation.
Always returns a fresh copy; mutating the result never touches internal state.
R reference: as(otu_table(x), "matrix") then as.data.frame()
SampleData¶
Wraps per-sample metadata as a pandas.DataFrame. The DataFrame index is the sample identifier — it must be unique and must match sample names in the OTU table.
import pandas as pd
from pyloseq import SampleData
meta = pd.DataFrame(
{"SampleType": ["Soil", "Ocean", "Skin"], "pH": [6.5, 8.1, 5.4]},
index=["S1", "S2", "S3"],
)
sam = SampleData(meta)
Retrieve the underlying DataFrame with .to_frame().
pyloseq.SampleData ¶
Wraps a pd.DataFrame of per-sample metadata.
The DataFrame index must be sample identifiers, and must be unique.
R reference: phyloseq::sample_data(object)
TaxTable¶
Wraps the taxonomic classification table. Rows are taxa (indexed by the same names as the OTU table rows), columns are taxonomic ranks.
import pandas as pd
from pyloseq import TaxTable
tax_df = pd.DataFrame(
{
"Kingdom": ["Bacteria", "Bacteria"],
"Phylum": ["Firmicutes", "Bacteroidetes"],
"Genus": ["Lactobacillus", "Bacteroides"],
},
index=["OTU1", "OTU2"],
)
tax = TaxTable(tax_df)
print(tax.rank_names) # ['Kingdom', 'Phylum', 'Genus']
pyloseq.TaxTable ¶
Wraps a pd.DataFrame of taxonomic classifications.
The DataFrame index must be taxa identifiers; columns are rank names
(e.g. ["Kingdom", "Phylum", "Class", "Order", "Family", "Genus",
"Species"]). Rank names are user-supplied and not hardcoded.
R reference: phyloseq::tax_table(object)
PhyTree¶
Wraps a skbio.TreeNode phylogenetic tree. Three constructors are available:
from pyloseq import PhyTree
# From a Newick string
tree = PhyTree.from_newick("((OTU1:0.1, OTU2:0.2):0.3, OTU3:0.4);")
# From a file
tree = PhyTree.from_newick_file("tree.nwk")
# From an existing skbio.TreeNode
import skbio
node = skbio.io.read("tree.nwk", format="newick", into=skbio.TreeNode)
tree = PhyTree(node)
Note
PhyTree.from_ape_rds() is not implemented. To use a tree saved from R with saveRDS, export it first: ape::write.tree(tree, "tree.nwk").
Prune to a specific set of tips:
pyloseq.PhyTree ¶
Wraps a skbio.tree.TreeNode with a phyloseq-compatible interface.
R reference: phyloseq::phy_tree(object)
internal_names
property
¶
Names of all internal (non-tip) nodes, excluding the root if unnamed.
R reference: phy_tree(x)$node.label
is_rooted
property
¶
True if the root has exactly 2 children (bifurcating root).
R reference: is.rooted(phy_tree(x))
tip_names
property
¶
Names of all leaf nodes.
R reference: taxa_names(phy_tree(x))
total_branch_length
property
¶
Sum of all branch lengths in the tree.
R reference: sum(phy_tree(x)$edge.length)
from_ape_rds
classmethod
¶
Construct from an R phylo object serialized as .rds.
Requires pyreadr (pip install pyreadr).
R reference: readRDS(path)
from_newick
classmethod
¶
Construct from a Newick string.
R reference: phy_tree(read.tree(text=s))
from_newick_file
classmethod
¶
Construct from a Newick file on disk.
R reference: phy_tree(read.tree(file=path))
prune ¶
Return a new tree containing only the specified tips and their ancestors.
Equivalent to ape::drop.tip with the complement set.
R reference: prune_taxa(keep, ps) (on the tree component)
to_newick ¶
Serialize to a Newick string.
R reference: ape::write.tree(phy_tree(x))
RefSeq¶
Stores reference sequences as a dict-like mapping from taxon name to skbio.DNA. Used for representative sequences from DADA2 or QIIME 2 denoising pipelines.
import skbio
from pyloseq import RefSeq
seqs = RefSeq({
"OTU1": skbio.DNA("ACGTACGT"),
"OTU2": skbio.DNA("TGCATGCA"),
})
# Round-trip through FASTA
seqs.to_fasta("representatives.fasta")
seqs2 = RefSeq.from_fasta("representatives.fasta")
pyloseq.RefSeq ¶
Wraps a dictionary of reference sequences keyed by taxon ID.
R reference: phyloseq::refseq(object)