Input / Output¶

All I/O functions return a Phyloseq object (readers) or write files (writers). The I/O module is importable directly from the top-level package:

from pyloseq import read_biom, write_biom, read_qza, read_qiime, read_mothur, read_csv, to_csv

BIOM¶

BIOM is the standard interchange format for microbiome count tables. Both BIOM v1 (JSON) and BIOM v2 (HDF5) are supported.

read_biom¶

ps = read_biom("feature-table.biom")

Taxonomy parsing is controlled by the parse_taxonomy parameter:

Value	Behaviour
`"default"`	Split on `"; "` or `";"`, strip rank prefixes like `"p__"`
`"qiime"`	QIIME 1 / GreenGenes semicolon-delimited strings
`"greengenes"`	Synonym for `"qiime"`
`None`	Store raw taxonomy strings as a single column
callable	Called with the raw observation metadata dict; must return a dict of rank → value

# No taxonomy parsing — keep raw strings
ps = read_biom("table.biom", parse_taxonomy=None)

# Custom parser
def my_parser(obs_meta):
    lineage = obs_meta.get("taxonomy", [])
    ranks = ["Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species"]
    return dict(zip(ranks, lineage))

ps = read_biom("table.biom", parse_taxonomy=my_parser)

pyloseq.read_biom ¶

read_biom(
    path: str | Path,
    parse_taxonomy: TaxonomyParser = "default",
) -> Phyloseq

Load a BIOM v1 (JSON) or v2 (HDF5) file into a Phyloseq object.

R reference: phyloseq::import_biom(BIOMfilename, parseFunction=parse_taxonomy)

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	Path to the `.biom` file.	required
`parse_taxonomy`	`TaxonomyParser`	How to interpret the `taxonomy` field in observation metadata. `"default"` splits on `"; "` or `";"` and assigns standard rank names. `"qiime"` / `"greengenes"` additionally strips rank prefixes (`k__`, `p__`, …). Pass a callable for custom parsing.	`'default'`

write_biom¶

write_biom(ps, "output.biom")

Writes a BIOM v2 (HDF5) file by default. The file format version is not configurable; BIOM v2 is universally supported by downstream tools.

pyloseq.write_biom ¶

write_biom(
    ps: Phyloseq, path: str | Path, version: str = "2.1"
) -> None

Write a Phyloseq object to a BIOM file.

R reference: phyloseq::export_biom(x, file)

Parameters:

Name	Type	Description	Default
`ps`	`Phyloseq`	The `Phyloseq` to serialise.	required
`path`	`str \| Path`	Output file path.	required
`version`	`str`	`"2.1"` (default) writes HDF5; `"1.0"` writes JSON.	`'2.1'`

QIIME 2¶

read_qza¶

Reads QIIME 2 .qza artifact files without requiring the qiime2 package. The artifact's semantic type is detected from the embedded metadata.yaml:

Semantic type	Loaded as
`FeatureTable[Frequency]`	`OtuTable`
`FeatureTable[RelativeFrequency]`	`OtuTable`
`FeatureData[Taxonomy]`	`TaxTable`
`FeatureData[Sequence]`	`RefSeq`
`Phylogeny[Rooted]`	`PhyTree`
`Phylogeny[Unrooted]`	`PhyTree`

Pass multiple .qza files to load a complete dataset:

from pyloseq.io import read_qza

ps = read_qza(
    "feature-table.qza",
    "taxonomy.qza",
    "rooted-tree.qza",
)

pyloseq.io.read_qza ¶

read_qza(
    features: str | Path | None = None,
    taxonomy: str | Path | None = None,
    tree: str | Path | None = None,
    metadata: str | Path | None = None,
    sequences: str | Path | None = None,
) -> Any

Load one or more QIIME 2 .qza artifacts into a Phyloseq.

Each argument should point to a .qza file of the matching semantic type. metadata is a sample-metadata TSV (not a .qza).

R reference: qiime2R::qza_to_phyloseq(features, tree, taxonomy, metadata)

QIIME 1¶

read_qiime¶

Reads QIIME 1 legacy files: a BIOM OTU table and an optional mapping file, tree, and reference sequences.

from pyloseq import read_qiime

ps = read_qiime(
    otu="otu_table.biom",
    mapping="sample_metadata.txt",
    tree="rep_set.tre",
)

The parse_taxonomy parameter accepts the same values as read_biom; the default is "qiime" because QIIME 1 uses semicolon-delimited taxonomy strings.

pyloseq.read_qiime ¶

read_qiime(
    otu: str | Path,
    mapping: str | Path | None = None,
    tree: str | Path | None = None,
    refseq: str | Path | None = None,
    parse_taxonomy: str = "qiime",
) -> Phyloseq

Load a QIIME 1 OTU table (+ optional mapping, tree, refseq) into a Phyloseq.

The OTU table must use the standard #OTU ID header row. If a taxonomy column is present (taxonomy or Consensus Lineage, any case) it is parsed into a TaxTable.

R reference: phyloseq::import_qiime(otufilename, mapfilename, treefilename, refseqfilename)

mothur¶

mothur stores results in .shared (OTU count table), .cons.taxonomy (consensus taxonomy), and .tre (tree) files.

read_mothur¶

from pyloseq import read_mothur

ps = read_mothur(
    shared="stability.opti_mcc.shared",
    constaxonomy="stability.opti_mcc.0.03.cons.taxonomy",
)

mothur shared files often contain multiple OTU definitions at different distance cutoffs. Use cutoff to select one:

ps = read_mothur(shared="stability.opti_mcc.shared", cutoff="0.03")

Pass a .list + .group combination instead of a .shared file to reconstruct the OTU table from raw assignments:

ps = read_mothur(list_file="stability.list", group="stability.groups", cutoff="0.03")

pyloseq.read_mothur ¶

read_mothur(
    shared: str | Path | None = None,
    constaxonomy: str | Path | None = None,
    tree: str | Path | None = None,
    list_file: str | Path | None = None,
    group: str | Path | None = None,
    cutoff: str | None = None,
) -> Phyloseq

Load mothur output files into a Phyloseq.

R reference: phyloseq::import_mothur(...)

Parameters:

Name	Type	Description	Default
`shared`	`str \| Path \| None`	Path to a `.shared` file. Produces the OTU table.	`None`
`constaxonomy`	`str \| Path \| None`	Path to a `.cons.taxonomy` file. Produces the `TaxTable`.	`None`
`tree`	`str \| Path \| None`	Path to a Newick `.tre` file. Produces the `PhyTree`.	`None`
`list_file`	`str \| Path \| None`	Path to a `.list` file (alternative to `shared`).	`None`
`group`	`str \| Path \| None`	Path to a `.group` file (required with `list_file`).	`None`
`cutoff`	`str \| None`	OTU similarity cutoff label (e.g. `"0.03"`). If `None`, the first label in the file is used.	`None`

show_mothur_cutoffs¶

from pyloseq import show_mothur_cutoffs

cutoffs = show_mothur_cutoffs("stability.opti_mcc.shared")
# ['unique', '0.01', '0.02', '0.03']

pyloseq.show_mothur_cutoffs ¶

show_mothur_cutoffs(path: str | Path) -> list[str]

Return all OTU cutoff labels present in a .list or .shared file.

R reference: show_mothur_cutoffs(mothurlist)

select_mothur_cutoff¶

Extracts the count table for a single cutoff label. Returns a DataFrame rather than a Phyloseq object, useful for inspecting the data before constructing a full object.

pyloseq.select_mothur_cutoff ¶

select_mothur_cutoff(
    path: str | Path, cutoff: str
) -> pd.DataFrame

Return the rows of a .list or .shared file at a given cutoff.

R reference: (internal helper, mirrors mothur's label-filtering)

CSV / TSV¶

For plain-text count tables not in any of the above formats.

read_csv¶

from pyloseq import read_csv

ps = read_csv(
    otu_path="otu_table.tsv",
    sample_path="metadata.tsv",
    tax_path="taxonomy.tsv",
    taxa_are_rows=True,
    sep="\t",
)

Only otu_path is required; other paths are optional. The OTU table index is used as taxa names; the sample table index is used as sample names.

pyloseq.read_csv ¶

read_csv(
    otu_path: str | Path,
    sample_path: str | Path | None = None,
    tax_path: str | Path | None = None,
    tree_path: str | Path | None = None,
    refseq_path: str | Path | None = None,
    taxa_are_rows: bool = True,
    sep: str = "\t",
) -> Phyloseq

Load a plain-text count table (+ optional metadata files) into a Phyloseq.

R reference: phyloseq::phyloseq(otu_table(read.csv(otu_path), taxa_are_rows), ...)

Parameters:

Name	Type	Description	Default
`otu_path`	`str \| Path`	Path to the abundance table CSV/TSV. First column is treated as the row index.	required
`sample_path`	`str \| Path \| None`	Optional sample metadata CSV/TSV. First column is the sample ID index.	`None`
`tax_path`	`str \| Path \| None`	Optional taxonomy CSV/TSV. First column is the taxon ID index.	`None`
`tree_path`	`str \| Path \| None`	Optional Newick tree file.	`None`
`refseq_path`	`str \| Path \| None`	Optional FASTA reference sequences.	`None`
`taxa_are_rows`	`bool`	Orientation of the OTU table (default `True`). Ignored when the file carries the pyloseq orientation marker written by :func:`to_csv` (such files are always taxa-as-rows).	`True`
`sep`	`str`	Field separator (default tab).	`'\t'`

to_csv¶

Writes each component to a separate file. Components not present in the Phyloseq are skipped; passing a path for a missing component raises pyloseqValidationError.

from pyloseq import to_csv

to_csv(
    ps,
    otu_path="otu_table.tsv",
    sample_path="metadata.tsv",
    tax_path="taxonomy.tsv",
    sep="\t",
)

pyloseq.to_csv ¶

to_csv(
    ps: Phyloseq,
    directory: str | Path,
    sep: str = "\t",
    prefix: str = "",
) -> dict[str, Path]

Write a Phyloseq to a directory of plain-text files.

The OTU table is always written in canonical taxa-as-rows orientation and tagged with an index marker so :func:read_csv restores it faithfully — the round-trip is orientation-preserving regardless of how the in-memory table happened to be oriented.

Returns a dict mapping component name → output path.

R reference: (no direct R equivalent; mirrors write.table() per component)

DESeq2¶

Phyloseq.to_deseq2() exports the count matrix and sample metadata in the format expected by pydeseq2. It returns a (counts, metadata) tuple — both plain pd.DataFrame objects — ready to pass directly to DeseqDataSet. pydeseq2 is not a pyloseq dependency; install it separately with pip install pydeseq2.

# pip install pydeseq2
from pydeseq2.dds import DeseqDataSet
from pydeseq2.ds import DeseqStats

counts, metadata = ps.to_deseq2()

dds = DeseqDataSet(counts=counts, metadata=metadata, design="~condition")
dds.deseq2()

ds = DeseqStats(dds, contrast=["condition", "treated", "control"])
ds.summary()
results = ds.results_df

counts has shape (n_samples, n_taxa) with samples as rows. Pass raw, un-normalized integer counts — DESeq2 performs its own size-factor normalization internally. A UserWarning is emitted if non-integer values are detected.

to_deseq2() raises ValueError if sample_data is not attached.

pyloseq.Phyloseq.to_deseq2 ¶

to_deseq2() -> tuple[pd.DataFrame, pd.DataFrame]

Return count matrix and sample metadata formatted for pydeseq2.

Returns a (counts, metadata) tuple ready to pass directly to DeseqDataSet(counts=counts, metadata=metadata, design=...).

counts has shape (n_samples, n_taxa) with samples as rows. metadata has shape (n_samples, n_variables) with a matching index.

Raises:

Type	Description
`ValueError`	If this object has no `sample_data`.