cogent3.core.alignment.Alignment#
- class Alignment(seqs_data: AlignedSeqsDataABC, slice_record: SliceRecord | None = None, **kwargs: Any)#
A collection of aligned sequences.
- Attributes:
annotation_dbthe annotation database for the collection
array_positionsReturns a numpy array of positions, axis 0 is alignment positions columns in order corresponding to names.
array_seqsReturns a numpy array of sequences, axis 0 is seqs in order
modifiedcollection is a modification of underlying storage
name_mapreturns mapping of seq names to parent seq names
namesreturns the names of the sequences in the collection
num_seqsthe number of sequences in the collection
- positions
seqsiterable of sequences in the collection
storagethe aligned sequence storage instance of the collection
Methods
add_feature(*[, seqid, parent_id, strand, ...])add feature on named sequence, or on the alignment itself
add_seqs(seqs, **kwargs)Returns new collection with additional sequences.
alignment_quality([app_name])Computes the alignment quality using the indicated app
apply_scaled_gaps(other[, aa_to_codon])applies gaps in self to ungapped sequences
coevolution([stat, segments, drawable, ...])performs pairwise coevolution measurement
copy([copy_annotations])creates new instance, only mutable attributes are copied
copy_annotations(seq_db)copy annotations into attached annotation db
count_ambiguous_per_seq()Return the counts of ambiguous characters per sequence as a DictArray.
count_gaps_per_pos([include_ambiguity])return counts of gaps per position as a DictArray
count_gaps_per_seq([induced_by, unique, ...])return counts of gaps per sequence as a DictArray
counts([motif_length, include_ambiguity, ...])counts of motifs
counts_per_pos([motif_length, ...])return MotifCountsArray of counts per position
counts_per_seq([motif_length, ...])counts of non-overlapping motifs per sequence
deepcopy(**kwargs)returns deep copy of self
degap([storage_backend])returns collection sequences without gaps or missing characters.
distance_matrix([calc, drop_invalid, parallel])Returns pairwise distances between sequences.
dotplot([name1, name2, window, threshold, ...])make a dotplot between two sequences.
drop_duplicated_seqs()returns self without duplicated sequences
duplicated_seqs()returns the names of duplicated sequences
entropy_per_pos([motif_length, ...])returns shannon entropy per position
entropy_per_seq([motif_length, ...])Returns the Shannon entropy per sequence.
filtered(predicate[, motif_length, ...])The alignment positions where predicate(column) is true.
get_ambiguous_positions()Returns dict of seq:{position:char} for ambiguous chars.
get_degapped_relative_to(name)Remove all columns with gaps in sequence with given name.
get_drawable(*[, biotype, width, vertical, ...])make a figure from sequence features
get_drawables(*[, biotype])returns a dict of drawables, keyed by type
get_features(*[, seqid, biotype, name, ...])yields Feature instances
get_gap_array([include_ambiguity])returns bool array with gap state True, False otherwise
get_gapped_seq(seqname[, recode_gaps])Return a gapped Sequence object for the specified seqname.
get_identical_sets([mask_degen])returns sets of names for sequences that are identical
get_lengths([include_ambiguity, allow_gap])returns sequence lengths as a dict of {seqid: length}
get_motif_probs([alphabet, ...])Return a dictionary of motif probs, calculated as the averaged frequency across sequences.
get_position_indices(f[, negate])Returns list of column indices for which f(col) is True.
get_projected_feature(*, seqid, feature)returns an alignment feature projected onto the seqid sequence
get_projected_features(*, seqid, **kwargs)projects all features from other sequences onto seqid
get_seq(seqname[, copy_annotations])Return a Sequence object for the specified seqname.
get_seq_names_if(f[, negate])Returns list of names of seqs where f(seq) is True.
get_similar(target, min_similarity, ...)Returns new SequenceCollection containing sequences similar to target.
get_translation([gc, incomplete_ok, ...])translate sequences from nucleic acid to protein
has_annotation_db()returns True if self has annotation db
has_terminal_stop([gc, strict])Returns True if any sequence has a terminal stop codon.
information_plot([width, height, window, ...])plot information per position
is_ragged()by definition False for an Alignment
iter_positions([pos_order])Iterates over positions in the alignment, in order.
iter_seqs([seq_order])Iterates over sequences in the collection, in order.
iupac_consensus([allow_gap])Returns string containing IUPAC consensus sequence of the alignment.
majority_consensus()Returns consensus sequence containing most frequent item at each position.
make_feature(*, feature[, on_alignment])create a feature on named sequence, or on the alignment itself
matching_ref(ref_name, gap_fraction, gap_run)Returns new alignment with seqs well aligned with a reference.
no_degenerates([motif_length, allow_gap])returns new alignment without degenerate characters
omit_bad_seqs([quantile])Returns new alignment without sequences with a number of uniquely introduced gaps exceeding quantile
omit_gap_pos([allowed_gap_frac, motif_length])Returns new alignment where all cols (motifs) have <= allowed_gap_frac gaps.
pad_seqs([pad_length])Returns copy in which sequences are padded with the gap character to same length.
probs_per_pos([motif_length, ...])returns MotifFreqsArray per position
probs_per_seq([motif_length, ...])return frequency array of motifs per sequence
quick_tree([calc, drop_invalid, parallel, ...])Returns a phylogenetic tree.
rc()Returns the reverse complement of all sequences in the alignment.
renamed_seqs(renamer)Returns new alignment with renamed sequences.
replace_annotation_db(value[, check])public interface to assigning the annotation_db
reverse_complement()Returns the reverse complement of all sequences in the collection.
sample(*, n, with_replacement, motif_length, ...)Returns random sample of positions from self, e.g. to bootstrap.
seqlogo([width, height, wrap, vspace, colours])returns Drawable sequence logo using mutual information
sliding_windows(window, step[, start, end])Generator yielding new alignments of given length and interval.
strand_symmetry([motif_length])returns dict of strand symmetry test results per ungapped seq
take_positions(cols[, negate])Returns new Alignment containing only specified positions.
take_positions_if(f[, negate])Returns new Alignment containing cols where f(col) is True.
take_seqs(names[, negate, copy_annotations])Returns new collection containing only specified seqs.
take_seqs_if(f[, negate])Returns new collection containing seqs where f(seq) is True.
to_dict(-> dict[str, str] -> dict[str, str])Return a dictionary of sequences.
to_dna()returns copy of self as a collection of DNA moltype seqs
to_fasta([block_size])Return collection in Fasta format.
to_html([name_order, wrap, limit, colors, ...])returns html with embedded styles for sequence colouring
to_json()returns json formatted string
to_moltype(moltype)returns copy of self with changed moltype
to_phylip()Return collection in PHYLIP format and mapping to sequence ids
to_pretty([name_order, wrap])returns a string representation of the alignment in pretty print format
to_rich_dict()returns a json serialisable dict
to_rna()returns copy of self as a collection of RNA moltype seqs
variable_positions([include_gap_motif, ...])Return a list of variable position indexes.
with_masked_annotations(biotypes[, ...])returns an alignment with regions replaced by mask_char
write(filename[, format_name])Write the sequences to a file, preserving order of sequences.
from_rich_dict
gapped_by_map
trim_stop_codons
Notes
Should be constructed using
make_aligned_seqs().