Trees#

Loading a tree from a file#

The path to the original file is stored in the .source attribute.

Visualising a tree with ascii_art()#

Note

See the Phylogenetic Trees for interactive graphical display of dendrograms.

Writing a tree to a file#

Getting a dict nodes keyed by their name#

Getting the name of a node#

The root node name defaults to "root".

You can ensure internal nodes get named

The object type of a tree and its nodes is the same#

Working with the nodes of a tree#

Get all the nodes, tips and edges as a dict.

As a list.

Only the tip (terminal) nodes as a list.

Iterate the tip nodes.

Get just the internal nodes as a list

or iteratively.

Getting the path between two tips or edges (connecting nodes)#

Get tip-to-root distances#

The sum of all lengths on nodes connecting tips to the root node.

Can also be done for a subset of tips.

Get tip-to-tip distances#

Get a distance matrix between all pairs of tips and a list of the tip nodes.

Getting the distance between two nodes#

Via pairwise distances, which returns a DistanceMatrix instance.

Or directly between the node objects.

Get sum of all branch lengths#

Getting the last common ancestor (LCA) for two nodes#

Getting all the ancestors for a node#

A list of all nodes to the tree root.

Getting all the children for a node#

Getting all the distances for a tree#

On a PhyloNode without branch lengths each branch has a weight of 1 so the distances represent the number of connected nodes. On a PhyloNode with branch lengths the measure is the sum of branch lengths.

Getting the two nodes that are farthest apart#

Get the nodes within a given distance#

Rerooting trees#

Reorienting a tree at a named node#

The method name is a bit misleading. If tr is an unrooted tree (loosely, this is a tree whose root node has > 2 children) then the result is more a re-orientation of the tree rather than true root.

At the midpoint#

This does produce a rooted tree.

Root at a named edge#

The edge can be either a tip or an internal node.

Tree representations#

Newick format#

Tree traversal#

Here is the example tree for reference:

Preorder#

Postorder#

Selecting subtrees#

Provide the names of nodes you want the subtree for. The default behaviour is to force the subtree to have the same number of children at the root as the original tree, in this case 2.

Use the as_rooted argument to ensure the selected subtree topology is as it existed on the original tree.

Tree manipulation methods#

Pruning the tree#

Remove internal nodes with only one child. Create new connections and branch lengths (if tree is a PhyloNode) to reflect the change.

The prune() modifies the tree in place.

Create a full unrooted copy of the tree#

Transform tree into a bifurcating tree#

Add internal nodes so that every node has 2 or fewer children.

Transform tree into a balanced tree#

Using a balanced tree can substantially improve performance of likelihood calculations for time-reversible models. Note that the resulting tree has a different orientation with the effect that specifying clades or stems for model parameterisation should be done using the “outgroup_name” argument.

Test two trees for same topology#

Branch lengths don’t matter.

Measure topological distances between two trees#

A number of topological tree distance metrics are available. They include:

  • The Robinson-Foulds Distance for rooted trees.

  • The Matching Cluster Distance for rooted trees.

  • The Robinson-Foulds Distance for unrooted trees.

  • The Lin-Rajan-Moret Distance for unrooted trees.

There are several variations of the Robinson-Foulds metric in the literature. The definition used by cogent3 is the cardinality of the symmetric difference of the sets of clades/splits in the two rooted/unrooted trees. Other definitions sometimes divide this by two, or normalise it to the unit interval.

The Robinson-Foulds distance is quick to compute, but is known to saturate quickly. Moving a single leaf in a tree can maximise this metric.

The Matching Cluster and Lin-Rajan-Moret are two matching-based distances that are more statistically robust. Unlike the Robinson-Foulds distance which counts how many of the splits/clades are not exactly same, the matching-based distances measures the degree by which the splits/clades are different. The matching-based distances solve a min-weight matching problem, which for large trees may take longer to compute.