Input / Output¶
All I/O functions return a Phyloseq object (readers) or write files (writers). The I/O module is importable directly from the top-level package:
BIOM¶
BIOM is the standard interchange format for microbiome count tables. Both BIOM v1 (JSON) and BIOM v2 (HDF5) are supported.
read_biom¶
Taxonomy parsing is controlled by the parse_taxonomy parameter:
| Value | Behaviour |
|---|---|
"default" |
Split on "; " or ";", strip rank prefixes like "p__" |
"qiime" |
QIIME 1 / GreenGenes semicolon-delimited strings |
"greengenes" |
Synonym for "qiime" |
None |
Store raw taxonomy strings as a single column |
| callable | Called with the raw observation metadata dict; must return a dict of rank → value |
# No taxonomy parsing — keep raw strings
ps = read_biom("table.biom", parse_taxonomy=None)
# Custom parser
def my_parser(obs_meta):
lineage = obs_meta.get("taxonomy", [])
ranks = ["Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species"]
return dict(zip(ranks, lineage))
ps = read_biom("table.biom", parse_taxonomy=my_parser)
pyloseq.read_biom ¶
Load a BIOM v1 (JSON) or v2 (HDF5) file into a Phyloseq object.
R reference: phyloseq::import_biom(BIOMfilename, parseFunction=parse_taxonomy)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to the |
required |
parse_taxonomy
|
TaxonomyParser
|
How to interpret the |
'default'
|
write_biom¶
Writes a BIOM v2 (HDF5) file by default. The file format version is not configurable; BIOM v2 is universally supported by downstream tools.
pyloseq.write_biom ¶
Write a Phyloseq object to a BIOM file.
R reference: phyloseq::export_biom(x, file)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ps
|
Phyloseq
|
The |
required |
path
|
str | Path
|
Output file path. |
required |
version
|
str
|
|
'2.1'
|
QIIME 2¶
read_qza¶
Reads QIIME 2 .qza artifact files without requiring the qiime2 package. The artifact's semantic type is detected from the embedded metadata.yaml:
| Semantic type | Loaded as |
|---|---|
FeatureTable[Frequency] |
OtuTable |
FeatureTable[RelativeFrequency] |
OtuTable |
FeatureData[Taxonomy] |
TaxTable |
FeatureData[Sequence] |
RefSeq |
Phylogeny[Rooted] |
PhyTree |
Phylogeny[Unrooted] |
PhyTree |
Pass multiple .qza files to load a complete dataset:
from pyloseq.io import read_qza
ps = read_qza(
"feature-table.qza",
"taxonomy.qza",
"rooted-tree.qza",
)
pyloseq.io.read_qza ¶
read_qza(
features: str | Path | None = None,
taxonomy: str | Path | None = None,
tree: str | Path | None = None,
metadata: str | Path | None = None,
sequences: str | Path | None = None,
) -> Any
Load one or more QIIME 2 .qza artifacts into a Phyloseq.
Each argument should point to a .qza file of the matching semantic
type. metadata is a sample-metadata TSV (not a .qza).
R reference: qiime2R::qza_to_phyloseq(features, tree, taxonomy, metadata)
QIIME 1¶
read_qiime¶
Reads QIIME 1 legacy files: a BIOM OTU table and an optional mapping file, tree, and reference sequences.
from pyloseq import read_qiime
ps = read_qiime(
otu="otu_table.biom",
mapping="sample_metadata.txt",
tree="rep_set.tre",
)
The parse_taxonomy parameter accepts the same values as read_biom; the default is "qiime" because QIIME 1 uses semicolon-delimited taxonomy strings.
pyloseq.read_qiime ¶
read_qiime(
otu: str | Path,
mapping: str | Path | None = None,
tree: str | Path | None = None,
refseq: str | Path | None = None,
parse_taxonomy: str = "qiime",
) -> Phyloseq
Load a QIIME 1 OTU table (+ optional mapping, tree, refseq) into a Phyloseq.
The OTU table must use the standard #OTU ID header row. If a
taxonomy column is present (taxonomy or Consensus Lineage, any
case) it is parsed into a TaxTable.
R reference: phyloseq::import_qiime(otufilename, mapfilename, treefilename, refseqfilename)
mothur¶
mothur stores results in .shared (OTU count table), .cons.taxonomy (consensus taxonomy), and .tre (tree) files.
read_mothur¶
from pyloseq import read_mothur
ps = read_mothur(
shared="stability.opti_mcc.shared",
constaxonomy="stability.opti_mcc.0.03.cons.taxonomy",
)
mothur shared files often contain multiple OTU definitions at different distance cutoffs. Use cutoff to select one:
Pass a .list + .group combination instead of a .shared file to reconstruct the OTU table from raw assignments:
pyloseq.read_mothur ¶
read_mothur(
shared: str | Path | None = None,
constaxonomy: str | Path | None = None,
tree: str | Path | None = None,
list_file: str | Path | None = None,
group: str | Path | None = None,
cutoff: str | None = None,
) -> Phyloseq
Load mothur output files into a Phyloseq.
R reference: phyloseq::import_mothur(...)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
shared
|
str | Path | None
|
Path to a |
None
|
constaxonomy
|
str | Path | None
|
Path to a |
None
|
tree
|
str | Path | None
|
Path to a Newick |
None
|
list_file
|
str | Path | None
|
Path to a |
None
|
group
|
str | Path | None
|
Path to a |
None
|
cutoff
|
str | None
|
OTU similarity cutoff label (e.g. |
None
|
show_mothur_cutoffs¶
from pyloseq import show_mothur_cutoffs
cutoffs = show_mothur_cutoffs("stability.opti_mcc.shared")
# ['unique', '0.01', '0.02', '0.03']
pyloseq.show_mothur_cutoffs ¶
Return all OTU cutoff labels present in a .list or .shared file.
R reference: show_mothur_cutoffs(mothurlist)
select_mothur_cutoff¶
Extracts the count table for a single cutoff label. Returns a DataFrame rather than a Phyloseq object, useful for inspecting the data before constructing a full object.
pyloseq.select_mothur_cutoff ¶
Return the rows of a .list or .shared file at a given cutoff.
R reference: (internal helper, mirrors mothur's label-filtering)
CSV / TSV¶
For plain-text count tables not in any of the above formats.
read_csv¶
from pyloseq import read_csv
ps = read_csv(
otu_path="otu_table.tsv",
sample_path="metadata.tsv",
tax_path="taxonomy.tsv",
taxa_are_rows=True,
sep="\t",
)
Only otu_path is required; other paths are optional. The OTU table index is used as taxa names; the sample table index is used as sample names.
pyloseq.read_csv ¶
read_csv(
otu_path: str | Path,
sample_path: str | Path | None = None,
tax_path: str | Path | None = None,
tree_path: str | Path | None = None,
refseq_path: str | Path | None = None,
taxa_are_rows: bool = True,
sep: str = "\t",
) -> Phyloseq
Load a plain-text count table (+ optional metadata files) into a Phyloseq.
R reference: phyloseq::phyloseq(otu_table(read.csv(otu_path), taxa_are_rows), ...)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
otu_path
|
str | Path
|
Path to the abundance table CSV/TSV. First column is treated as the row index. |
required |
sample_path
|
str | Path | None
|
Optional sample metadata CSV/TSV. First column is the sample ID index. |
None
|
tax_path
|
str | Path | None
|
Optional taxonomy CSV/TSV. First column is the taxon ID index. |
None
|
tree_path
|
str | Path | None
|
Optional Newick tree file. |
None
|
refseq_path
|
str | Path | None
|
Optional FASTA reference sequences. |
None
|
taxa_are_rows
|
bool
|
Orientation of the OTU table (default |
True
|
sep
|
str
|
Field separator (default tab). |
'\t'
|
to_csv¶
Writes each component to a separate file. Components not present in the Phyloseq are skipped; passing a path for a missing component raises pyloseqValidationError.
from pyloseq import to_csv
to_csv(
ps,
otu_path="otu_table.tsv",
sample_path="metadata.tsv",
tax_path="taxonomy.tsv",
sep="\t",
)
pyloseq.to_csv ¶
to_csv(
ps: Phyloseq,
directory: str | Path,
sep: str = "\t",
prefix: str = "",
) -> dict[str, Path]
Write a Phyloseq to a directory of plain-text files.
The OTU table is always written in canonical taxa-as-rows orientation and
tagged with an index marker so :func:read_csv restores it faithfully —
the round-trip is orientation-preserving regardless of how the in-memory
table happened to be oriented.
Returns a dict mapping component name → output path.
R reference: (no direct R equivalent; mirrors write.table() per component)