tree_construction.Rmd
Users need a tree in newick format before using iTOL and itol.toolkit. Phylogenetic trees and clustering trees are the most common types of trees. Phylogenetic trees are usually constructed based on nucleic acid sequences or amino acid sequences using character-based methods, such as maximum likelihood (ML) method. However, users can cluster the samples from a numeric matrix using hierarchical clustering method as well.
itol.toolkit provides basic clustering tree construction methods, allowing users to build trees based on the numeric matrix using both weighted and unweighted methods.
This section uses dataset 1 to demonstrate the method of constructing a clustering tree in newick format by hierarchical clustering (refer to the Dataset for detail information). We will demonstrate how to construct the tree using both unweighted and weighted methods using of parameters and class of each template.
In practice users need to prepare a numeric matrix for tree construction, and if you want using weighted method, then the grouping information is needed.
library(itol.toolkit)
library(data.table)
library(dplyr)
file = system.file("extdata",
"iTOL_template_parameters_matrix.txt",
package = "itol.toolkit")
template_parameters_count <- fread(file)
order <- names(template_parameters_count)[-1]
template_parameters_count <- template_parameters_count[,-1]
template_parameters_count <- convert_01(object = template_parameters_count) %>%
t() %>%
as.data.frame()
For unweighted clustering, users only need to import the numeric matrix.
cluster_tree <- count_to_tree(count = template_parameters_count)
plot(cluster_tree)
For weighted clustering, users need to specify the grouping information and weight through the “group” and “weight” parameters.
Once the tree is built, the user needs to export the tree to a newick file for importing to iTOL, etc.
write.tree(cluster_tree, paste0(getwd(), "/tree_of_itol_templates.tree"))
Users can also perform hierarchical clustering with the cluster package.
When using the cluster package’s agnes()
function for
clustering, users need to specify the method used to calculate
dissimilarity between clusters. Here we calculate the agglomerative
coefficient of each method by writing a short function and finally
choose the method with the value closest to 1.
#define linkage methods
m <- c( "average", "single", "complete", "ward")
names(m) <- c( "average", "single", "complete", "ward")
#function to compute agglomerative coefficient
ac <- function(x) {
agnes(template_parameters_count, method = x)$ac
}
#calculate agglomerative coefficient for each clustering linkage method
sapply(m, ac)
After selecting the best method, the user can construct the
clustering tree by the agnes()
function.
IOCAS, weiyLiu@outlook.com↩︎
CACMS, njbxhzy@hotmail.com↩︎
IOCAS, tongzhou2017@gmail.com↩︎