Containers¶

pyloseq represents microbiome data through a small set of typed container classes. Phyloseq is the top-level object that holds the others. All containers are immutable in the sense that manipulation functions never modify them in-place — they always return new objects.

Phyloseq¶

Phyloseq is the central data object. It bundles an OTU table with any combination of sample metadata, taxonomic annotations, a phylogenetic tree, and reference sequences. The constructor validates component consistency and silently prunes to the intersection of taxa and sample names across all attached components.

from pyloseq import Phyloseq, OtuTable, SampleData, TaxTable, PhyTree

ps = Phyloseq(
    otu=OtuTable(df, taxa_are_rows=True),
    sam=SampleData(metadata_df),
    tax=TaxTable(taxonomy_df),
    tree=PhyTree.from_newick(newick_str),
)

pyloseq.Phyloseq ¶

Container for microbiome data: OTU table + optional metadata components.

Mirrors R's phyloseq-class. The constructor accepts any subset of components, runs the validator suite, and silently prunes to the intersection of names across components (unless strict=True).

By default, pruning during construction emits a warning so the data loss is discoverable; pass quiet=True to suppress it.

R reference: phyloseq::phyloseq(otu_table, sample_data, tax_table, phy_tree, refseq)

nsamples `property` ¶

nsamples: int

Number of samples.

R reference: nsamples(x)

ntaxa `property` ¶

ntaxa: int

Number of taxa.

R reference: ntaxa(x)

otu_table `property` `writable` ¶

otu_table: OtuTable

The OTU/feature abundance table.

R reference: otu_table(x)

phy_tree `property` `writable` ¶

phy_tree: PhyTree | None

Phylogenetic tree, or None if not provided.

R reference: phy_tree(x)

rank_names `property` ¶

rank_names: list[str]

Taxonomic rank names, or [] if no tax table.

R reference: rank_names(x)

refseq `property` `writable` ¶

refseq: RefSeq | None

Reference sequences, or None if not provided.

R reference: refseq(x)

sample_data `property` `writable` ¶

sample_data: SampleData | None

Per-sample metadata, or None if not provided.

R reference: sample_data(x)

sample_names `property` ¶

sample_names: Index

Sample identifiers from the OTU table.

R reference: sample_names(x)

sample_variables `property` ¶

sample_variables: list[str]

Names of sample metadata columns, or [] if no sample data.

R reference: sample_variables(x)

tax_table `property` `writable` ¶

tax_table: TaxTable | None

Taxonomic classification table, or None if not provided.

R reference: tax_table(x)

taxa_names `property` ¶

taxa_names: Index

Taxon identifiers from the OTU table.

R reference: taxa_names(x)

distance ¶

distance(
    method: str = "bray", **kwargs: Any
) -> DistanceMatrix

Compute a pairwise distance matrix.

Thin wrapper around :func:pyloseq.distance — returns a skbio.stats.distance.DistanceMatrix usable directly with skbio.stats.distance.permanova / anosim.

R reference: distance(physeq, method)

Parameters:

Name	Type	Description	Default
`method`	`str`	Distance method string (e.g. `"bray"`, `"unifrac"`).	`'bray'`
`**kwargs`	`Any`	Forwarded to the underlying implementation.	`{}`

get_sample ¶

get_sample(i: str) -> pd.Series

Return the abundance vector for a single sample across all taxa.

R reference: get_sample(x, i)

get_taxa ¶

get_taxa(i: str) -> pd.Series

Return the abundance vector for a single taxon across all samples.

R reference: get_taxa(x, i)

get_variable ¶

get_variable(v: str) -> pd.Series

Return a sample metadata column as a Series.

R reference: get_variable(x, v)

melt ¶

melt() -> pd.DataFrame

Melt to a long-form tidy DataFrame (one row per OTU × Sample pair).

Equivalent to the free function :func:pyloseq.psmelt.

R reference: psmelt(physeq)

ordinate ¶

ordinate(
    method: str = "PCoA",
    distance: str = "bray",
    formula: str | None = None,
    **kwargs: Any,
) -> OrdinationResults

Run multivariate ordination.

Thin wrapper around :func:pyloseq.ordinate — returns an skbio.stats.ordination.OrdinationResults.

R reference: ordinate(physeq, method, distance, formula)

Parameters:

Name	Type	Description	Default
`method`	`str`	Ordination method: `"PCoA"`, `"NMDS"`, `"CCA"`, etc.	`'PCoA'`
`distance`	`str`	Distance method or pre-computed `DistanceMatrix`.	`'bray'`
`formula`	`str \| None`	Model formula for constrained methods (e.g. `"~SampleType"`).	`None`
`**kwargs`	`Any`	Forwarded to the underlying implementation.	`{}`

sample_sums ¶

sample_sums() -> pd.Series

Sum of abundances across all taxa for each sample.

R reference: sample_sums(x)

taxa_sums ¶

taxa_sums() -> pd.Series

Sum of abundances across all samples for each taxon.

R reference: taxa_sums(x)

to_deseq2 ¶

to_deseq2() -> tuple[pd.DataFrame, pd.DataFrame]

Return count matrix and sample metadata formatted for pydeseq2.

Returns a (counts, metadata) tuple ready to pass directly to DeseqDataSet(counts=counts, metadata=metadata, design=...).

counts has shape (n_samples, n_taxa) with samples as rows. metadata has shape (n_samples, n_variables) with a matching index.

Raises:

Type	Description
`ValueError`	If this object has no `sample_data`.

Validation and pruning¶

When taxa or sample names differ between components, the constructor prunes each to the intersection and emits a warning. Pass quiet=True to suppress the warning. Pass strict=True to raise pyloseqValidationError instead of pruning:

# Raises if otu and tax have mismatched taxa names
ps = Phyloseq(otu=otu, tax=tax, strict=True)

# Prune silently
ps = Phyloseq(otu=otu, tax=tax, quiet=True)

Component setters (e.g., ps.tax_table = new_tax) trigger re-validation using the same logic.

OtuTable¶

Stores the feature abundance matrix. Rows can be taxa or samples — track orientation with taxa_are_rows.

import pandas as pd
from pyloseq import OtuTable

df = pd.DataFrame(
    {"S1": [10, 0, 5], "S2": [0, 3, 7]},
    index=["OTU1", "OTU2", "OTU3"],
)
otu = OtuTable(df, taxa_are_rows=True)

Sparse input (NumPy sparse matrices, scipy CSR/CSC) is accepted. When the matrix density is below 50%, the internal representation is automatically converted to CSR format.

to_dataframe() always returns a DataFrame with taxa as rows, regardless of internal orientation:

df = otu.to_dataframe()   # taxa × samples

Flip orientation without copying data:

otu.taxa_are_rows = False  # now samples are rows internally

pyloseq.OtuTable ¶

Stores an OTU/feature abundance table with orientation tracking.

Internally stores dense data as a pd.DataFrame and sparse data (density < 50 %) as a scipy.sparse.csr_matrix with separate index/column arrays.

R reference: phyloseq::otu_table(object, taxa_are_rows)

nsamples `property` ¶

nsamples: int

Number of samples.

R reference: nsamples(x)

ntaxa `property` ¶

ntaxa: int

Number of taxa.

R reference: ntaxa(x)

sample_names `property` `writable` ¶

sample_names: Index

Sample identifiers.

R reference: sample_names(x)

taxa_are_rows `property` `writable` ¶

taxa_are_rows: bool

Whether taxa occupy rows (True) or columns (False).

taxa_names `property` `writable` ¶

taxa_names: Index

Taxa (OTU/ASV) identifiers.

R reference: taxa_names(x)

init ¶

__init__(
    data: ndarray | DataFrame | spmatrix | list[Any],
    taxa_are_rows: bool = True,
) -> None

Parameters:

Name	Type	Description	Default
`data`	`ndarray \| DataFrame \| spmatrix \| list[Any]`	Abundance matrix. Accepted types: `pd.DataFrame`, `np.ndarray`, any `scipy.sparse` matrix, or a list-of-lists. When a `scipy.sparse` matrix is supplied directly, sparse storage is always used regardless of matrix density (the 50 % density threshold only applies to dense inputs).	required
`taxa_are_rows`	`bool`	If `True` (default), rows represent taxa and columns represent samples.	`True`

copy ¶

copy() -> OtuTable

Return a deep copy.

R reference: otu_table(x) <- otu_table(x) (effectively)

sample_sums ¶

sample_sums() -> pd.Series

Sum of abundances across all taxa for each sample.

R reference: sample_sums(x)

taxa_sums ¶

taxa_sums() -> pd.Series

Sum of abundances across all samples for each taxon.

R reference: taxa_sums(x)

to_dataframe ¶

to_dataframe() -> pd.DataFrame

Return the abundance matrix as a pd.DataFrame in current orientation.

Always returns a fresh copy; mutating the result never touches internal state.

R reference: as(otu_table(x), "matrix") then as.data.frame()

SampleData¶

Wraps per-sample metadata as a pandas.DataFrame. The DataFrame index is the sample identifier — it must be unique and must match sample names in the OTU table.

import pandas as pd
from pyloseq import SampleData

meta = pd.DataFrame(
    {"SampleType": ["Soil", "Ocean", "Skin"], "pH": [6.5, 8.1, 5.4]},
    index=["S1", "S2", "S3"],
)
sam = SampleData(meta)

Retrieve the underlying DataFrame with .to_frame().

pyloseq.SampleData ¶

Wraps a pd.DataFrame of per-sample metadata.

The DataFrame index must be sample identifiers, and must be unique.

R reference: phyloseq::sample_data(object)

names `property` ¶

names: Index

Deprecated alias for :attr:sample_names. Use sample_names instead.

.. deprecated:: Use :attr:sample_names.

sample_names `property` ¶

sample_names: Index

Sample identifiers (DataFrame index).

R reference: sample_names(x)

variables `property` ¶

variables: Index

Sample variable names (DataFrame columns).

R reference: sample_variables(x)

copy ¶

copy() -> SampleData

Return a deep copy of this SampleData.

to_frame ¶

to_frame() -> pd.DataFrame

Return a copy of the underlying DataFrame.

R reference: as(sample_data(x), "data.frame")

TaxTable¶

Wraps the taxonomic classification table. Rows are taxa (indexed by the same names as the OTU table rows), columns are taxonomic ranks.

import pandas as pd
from pyloseq import TaxTable

tax_df = pd.DataFrame(
    {
        "Kingdom": ["Bacteria", "Bacteria"],
        "Phylum":  ["Firmicutes", "Bacteroidetes"],
        "Genus":   ["Lactobacillus", "Bacteroides"],
    },
    index=["OTU1", "OTU2"],
)
tax = TaxTable(tax_df)
print(tax.rank_names)   # ['Kingdom', 'Phylum', 'Genus']

pyloseq.TaxTable ¶

Wraps a pd.DataFrame of taxonomic classifications.

The DataFrame index must be taxa identifiers; columns are rank names (e.g. ["Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species"]). Rank names are user-supplied and not hardcoded.

R reference: phyloseq::tax_table(object)

rank_names `property` ¶

rank_names: list[str]

Taxonomic rank names (column names).

R reference: rank_names(x)

taxa_names `property` ¶

taxa_names: Index

Taxon identifiers (DataFrame index).

R reference: taxa_names(x)

copy ¶

copy() -> TaxTable

Return a deep copy of this TaxTable.

to_frame ¶

to_frame() -> pd.DataFrame

Return a copy of the underlying DataFrame.

R reference: as(tax_table(x), "matrix")

PhyTree¶

Wraps a skbio.TreeNode phylogenetic tree. Three constructors are available:

from pyloseq import PhyTree

# From a Newick string
tree = PhyTree.from_newick("((OTU1:0.1, OTU2:0.2):0.3, OTU3:0.4);")

# From a file
tree = PhyTree.from_newick_file("tree.nwk")

# From an existing skbio.TreeNode
import skbio
node = skbio.io.read("tree.nwk", format="newick", into=skbio.TreeNode)
tree = PhyTree(node)

Note

PhyTree.from_ape_rds() is not implemented. To use a tree saved from R with saveRDS, export it first: ape::write.tree(tree, "tree.nwk").

Prune to a specific set of tips:

pruned = tree.prune(["OTU1", "OTU3"])

pyloseq.PhyTree ¶

Wraps a skbio.tree.TreeNode with a phyloseq-compatible interface.

R reference: phyloseq::phy_tree(object)

internal_names `property` ¶

internal_names: list[str]

Names of all internal (non-tip) nodes, excluding the root if unnamed.

R reference: phy_tree(x)$node.label

is_rooted `property` ¶

is_rooted: bool

True if the root has exactly 2 children (bifurcating root).

R reference: is.rooted(phy_tree(x))

n_tips `property` ¶

n_tips: int

Number of tip (leaf) nodes.

R reference: ntaxa(phy_tree(x))

tip_names `property` ¶

tip_names: list[str]

Names of all leaf nodes.

R reference: taxa_names(phy_tree(x))

total_branch_length `property` ¶

total_branch_length: float

Sum of all branch lengths in the tree.

R reference: sum(phy_tree(x)$edge.length)

copy ¶

copy() -> PhyTree

Return a deep copy of this PhyTree via Newick round-trip.

from_ape_rds `classmethod` ¶

from_ape_rds(path: str | Path) -> PhyTree

Construct from an R phylo object serialized as .rds.

Requires pyreadr (pip install pyreadr).

R reference: readRDS(path)

from_newick `classmethod` ¶

from_newick(s: str) -> PhyTree

Construct from a Newick string.

R reference: phy_tree(read.tree(text=s))

from_newick_file `classmethod` ¶

from_newick_file(path: str | Path) -> PhyTree

Construct from a Newick file on disk.

R reference: phy_tree(read.tree(file=path))

prune ¶

prune(keep: list[str]) -> PhyTree

Return a new tree containing only the specified tips and their ancestors.

Equivalent to ape::drop.tip with the complement set.

R reference: prune_taxa(keep, ps) (on the tree component)

to_newick ¶

to_newick() -> str

Serialize to a Newick string.

R reference: ape::write.tree(phy_tree(x))

RefSeq¶

Stores reference sequences as a dict-like mapping from taxon name to skbio.DNA. Used for representative sequences from DADA2 or QIIME 2 denoising pipelines.

import skbio
from pyloseq import RefSeq

seqs = RefSeq({
    "OTU1": skbio.DNA("ACGTACGT"),
    "OTU2": skbio.DNA("TGCATGCA"),
})

# Round-trip through FASTA
seqs.to_fasta("representatives.fasta")
seqs2 = RefSeq.from_fasta("representatives.fasta")

pyloseq.RefSeq ¶

Wraps a dictionary of reference sequences keyed by taxon ID.

R reference: phyloseq::refseq(object)

names `property` ¶

names: Index

Deprecated alias for :attr:taxa_names. Use taxa_names instead.

.. deprecated:: Use :attr:taxa_names.

taxa_names `property` ¶

taxa_names: Index

Taxon identifiers for all stored sequences.

R reference: taxa_names(x)

copy ¶

copy() -> RefSeq

Return a deep copy of this RefSeq.

from_fasta `classmethod` ¶

from_fasta(path: str | Path) -> RefSeq

Load sequences from a FASTA file.

R reference: readDNAStringSet() then RefSeq(x)

to_fasta ¶

to_fasta(path: str | Path) -> None

Write sequences to a FASTA file.

R reference: writeXStringSet(refseq(x), filepath)

Containers¶

Phyloseq¶

pyloseq.Phyloseq ¶

nsamples property ¶

ntaxa property ¶

otu_table property writable ¶

phy_tree property writable ¶

rank_names property ¶

refseq property writable ¶

sample_data property writable ¶

sample_names property ¶

sample_variables property ¶

tax_table property writable ¶

taxa_names property ¶

distance ¶

get_sample ¶

get_taxa ¶

get_variable ¶

melt ¶

ordinate ¶

sample_sums ¶

taxa_sums ¶

to_deseq2 ¶

Validation and pruning¶

OtuTable¶

pyloseq.OtuTable ¶

nsamples property ¶

ntaxa property ¶

sample_names property writable ¶

taxa_are_rows property writable ¶

taxa_names property writable ¶

__init__ ¶

copy ¶

sample_sums ¶

taxa_sums ¶

to_dataframe ¶

SampleData¶

pyloseq.SampleData ¶

names property ¶

sample_names property ¶

variables property ¶

copy ¶

to_frame ¶

TaxTable¶

pyloseq.TaxTable ¶

rank_names property ¶

taxa_names property ¶

copy ¶

to_frame ¶

PhyTree¶

pyloseq.PhyTree ¶

internal_names property ¶

is_rooted property ¶

n_tips property ¶

tip_names property ¶

total_branch_length property ¶

copy ¶

from_ape_rds classmethod ¶

from_newick classmethod ¶

from_newick_file classmethod ¶

prune ¶

to_newick ¶

RefSeq¶

pyloseq.RefSeq ¶

names property ¶

taxa_names property ¶

copy ¶

from_fasta classmethod ¶

to_fasta ¶

nsamples `property` ¶

ntaxa `property` ¶

otu_table `property` `writable` ¶

phy_tree `property` `writable` ¶

rank_names `property` ¶

refseq `property` `writable` ¶

sample_data `property` `writable` ¶

sample_names `property` ¶

sample_variables `property` ¶

tax_table `property` `writable` ¶

taxa_names `property` ¶

nsamples `property` ¶

ntaxa `property` ¶

sample_names `property` `writable` ¶

taxa_are_rows `property` `writable` ¶

taxa_names `property` `writable` ¶

init ¶

names `property` ¶

sample_names `property` ¶

variables `property` ¶

rank_names `property` ¶

taxa_names `property` ¶

internal_names `property` ¶

is_rooted `property` ¶

n_tips `property` ¶

tip_names `property` ¶

total_branch_length `property` ¶

from_ape_rds `classmethod` ¶

from_newick `classmethod` ¶

from_newick_file `classmethod` ¶

names `property` ¶

taxa_names `property` ¶

from_fasta `classmethod` ¶