Skip to content

Input / Output

All I/O functions return a Phyloseq object (readers) or write files (writers). The I/O module is importable directly from the top-level package:

from pyloseq import read_biom, write_biom, read_qza, read_qiime, read_mothur, read_csv, to_csv

BIOM

BIOM is the standard interchange format for microbiome count tables. Both BIOM v1 (JSON) and BIOM v2 (HDF5) are supported.

read_biom

ps = read_biom("feature-table.biom")

Taxonomy parsing is controlled by the parse_taxonomy parameter:

Value Behaviour
"default" Split on "; " or ";", strip rank prefixes like "p__"
"qiime" QIIME 1 / GreenGenes semicolon-delimited strings
"greengenes" Synonym for "qiime"
None Store raw taxonomy strings as a single column
callable Called with the raw observation metadata dict; must return a dict of rank → value
# No taxonomy parsing — keep raw strings
ps = read_biom("table.biom", parse_taxonomy=None)

# Custom parser
def my_parser(obs_meta):
    lineage = obs_meta.get("taxonomy", [])
    ranks = ["Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species"]
    return dict(zip(ranks, lineage))

ps = read_biom("table.biom", parse_taxonomy=my_parser)

pyloseq.read_biom

read_biom(
    path: str | Path,
    parse_taxonomy: TaxonomyParser = "default",
) -> Phyloseq

Load a BIOM v1 (JSON) or v2 (HDF5) file into a Phyloseq object.

R reference: phyloseq::import_biom(BIOMfilename, parseFunction=parse_taxonomy)

Parameters:

Name Type Description Default
path str | Path

Path to the .biom file.

required
parse_taxonomy TaxonomyParser

How to interpret the taxonomy field in observation metadata. "default" splits on "; " or ";" and assigns standard rank names. "qiime" / "greengenes" additionally strips rank prefixes (k__, p__, …). Pass a callable for custom parsing.

'default'

write_biom

write_biom(ps, "output.biom")

Writes a BIOM v2 (HDF5) file by default. The file format version is not configurable; BIOM v2 is universally supported by downstream tools.

pyloseq.write_biom

write_biom(
    ps: Phyloseq, path: str | Path, version: str = "2.1"
) -> None

Write a Phyloseq object to a BIOM file.

R reference: phyloseq::export_biom(x, file)

Parameters:

Name Type Description Default
ps Phyloseq

The Phyloseq to serialise.

required
path str | Path

Output file path.

required
version str

"2.1" (default) writes HDF5; "1.0" writes JSON.

'2.1'

QIIME 2

read_qza

Reads QIIME 2 .qza artifact files without requiring the qiime2 package. The artifact's semantic type is detected from the embedded metadata.yaml:

Semantic type Loaded as
FeatureTable[Frequency] OtuTable
FeatureTable[RelativeFrequency] OtuTable
FeatureData[Taxonomy] TaxTable
FeatureData[Sequence] RefSeq
Phylogeny[Rooted] PhyTree
Phylogeny[Unrooted] PhyTree

Pass multiple .qza files to load a complete dataset:

from pyloseq.io import read_qza

ps = read_qza(
    "feature-table.qza",
    "taxonomy.qza",
    "rooted-tree.qza",
)

pyloseq.io.read_qza

read_qza(
    features: str | Path | None = None,
    taxonomy: str | Path | None = None,
    tree: str | Path | None = None,
    metadata: str | Path | None = None,
    sequences: str | Path | None = None,
) -> Any

Load one or more QIIME 2 .qza artifacts into a Phyloseq.

Each argument should point to a .qza file of the matching semantic type. metadata is a sample-metadata TSV (not a .qza).

R reference: qiime2R::qza_to_phyloseq(features, tree, taxonomy, metadata)


QIIME 1

read_qiime

Reads QIIME 1 legacy files: a BIOM OTU table and an optional mapping file, tree, and reference sequences.

from pyloseq import read_qiime

ps = read_qiime(
    otu="otu_table.biom",
    mapping="sample_metadata.txt",
    tree="rep_set.tre",
)

The parse_taxonomy parameter accepts the same values as read_biom; the default is "qiime" because QIIME 1 uses semicolon-delimited taxonomy strings.

pyloseq.read_qiime

read_qiime(
    otu: str | Path,
    mapping: str | Path | None = None,
    tree: str | Path | None = None,
    refseq: str | Path | None = None,
    parse_taxonomy: str = "qiime",
) -> Phyloseq

Load a QIIME 1 OTU table (+ optional mapping, tree, refseq) into a Phyloseq.

The OTU table must use the standard #OTU ID header row. If a taxonomy column is present (taxonomy or Consensus Lineage, any case) it is parsed into a TaxTable.

R reference: phyloseq::import_qiime(otufilename, mapfilename, treefilename, refseqfilename)


mothur

mothur stores results in .shared (OTU count table), .cons.taxonomy (consensus taxonomy), and .tre (tree) files.

read_mothur

from pyloseq import read_mothur

ps = read_mothur(
    shared="stability.opti_mcc.shared",
    constaxonomy="stability.opti_mcc.0.03.cons.taxonomy",
)

mothur shared files often contain multiple OTU definitions at different distance cutoffs. Use cutoff to select one:

ps = read_mothur(shared="stability.opti_mcc.shared", cutoff="0.03")

Pass a .list + .group combination instead of a .shared file to reconstruct the OTU table from raw assignments:

ps = read_mothur(list_file="stability.list", group="stability.groups", cutoff="0.03")

pyloseq.read_mothur

read_mothur(
    shared: str | Path | None = None,
    constaxonomy: str | Path | None = None,
    tree: str | Path | None = None,
    list_file: str | Path | None = None,
    group: str | Path | None = None,
    cutoff: str | None = None,
) -> Phyloseq

Load mothur output files into a Phyloseq.

R reference: phyloseq::import_mothur(...)

Parameters:

Name Type Description Default
shared str | Path | None

Path to a .shared file. Produces the OTU table.

None
constaxonomy str | Path | None

Path to a .cons.taxonomy file. Produces the TaxTable.

None
tree str | Path | None

Path to a Newick .tre file. Produces the PhyTree.

None
list_file str | Path | None

Path to a .list file (alternative to shared).

None
group str | Path | None

Path to a .group file (required with list_file).

None
cutoff str | None

OTU similarity cutoff label (e.g. "0.03"). If None, the first label in the file is used.

None

show_mothur_cutoffs

from pyloseq import show_mothur_cutoffs

cutoffs = show_mothur_cutoffs("stability.opti_mcc.shared")
# ['unique', '0.01', '0.02', '0.03']

pyloseq.show_mothur_cutoffs

show_mothur_cutoffs(path: str | Path) -> list[str]

Return all OTU cutoff labels present in a .list or .shared file.

R reference: show_mothur_cutoffs(mothurlist)

select_mothur_cutoff

Extracts the count table for a single cutoff label. Returns a DataFrame rather than a Phyloseq object, useful for inspecting the data before constructing a full object.

pyloseq.select_mothur_cutoff

select_mothur_cutoff(
    path: str | Path, cutoff: str
) -> pd.DataFrame

Return the rows of a .list or .shared file at a given cutoff.

R reference: (internal helper, mirrors mothur's label-filtering)


CSV / TSV

For plain-text count tables not in any of the above formats.

read_csv

from pyloseq import read_csv

ps = read_csv(
    otu_path="otu_table.tsv",
    sample_path="metadata.tsv",
    tax_path="taxonomy.tsv",
    taxa_are_rows=True,
    sep="\t",
)

Only otu_path is required; other paths are optional. The OTU table index is used as taxa names; the sample table index is used as sample names.

pyloseq.read_csv

read_csv(
    otu_path: str | Path,
    sample_path: str | Path | None = None,
    tax_path: str | Path | None = None,
    tree_path: str | Path | None = None,
    refseq_path: str | Path | None = None,
    taxa_are_rows: bool = True,
    sep: str = "\t",
) -> Phyloseq

Load a plain-text count table (+ optional metadata files) into a Phyloseq.

R reference: phyloseq::phyloseq(otu_table(read.csv(otu_path), taxa_are_rows), ...)

Parameters:

Name Type Description Default
otu_path str | Path

Path to the abundance table CSV/TSV. First column is treated as the row index.

required
sample_path str | Path | None

Optional sample metadata CSV/TSV. First column is the sample ID index.

None
tax_path str | Path | None

Optional taxonomy CSV/TSV. First column is the taxon ID index.

None
tree_path str | Path | None

Optional Newick tree file.

None
refseq_path str | Path | None

Optional FASTA reference sequences.

None
taxa_are_rows bool

Orientation of the OTU table (default True). Ignored when the file carries the pyloseq orientation marker written by :func:to_csv (such files are always taxa-as-rows).

True
sep str

Field separator (default tab).

'\t'

to_csv

Writes each component to a separate file. Components not present in the Phyloseq are skipped; passing a path for a missing component raises pyloseqValidationError.

from pyloseq import to_csv

to_csv(
    ps,
    otu_path="otu_table.tsv",
    sample_path="metadata.tsv",
    tax_path="taxonomy.tsv",
    sep="\t",
)

pyloseq.to_csv

to_csv(
    ps: Phyloseq,
    directory: str | Path,
    sep: str = "\t",
    prefix: str = "",
) -> dict[str, Path]

Write a Phyloseq to a directory of plain-text files.

The OTU table is always written in canonical taxa-as-rows orientation and tagged with an index marker so :func:read_csv restores it faithfully — the round-trip is orientation-preserving regardless of how the in-memory table happened to be oriented.

Returns a dict mapping component name → output path.

R reference: (no direct R equivalent; mirrors write.table() per component)