Datasets

Dataset structure

A complete dataset should include at least the following:

Tree file in Newick format.
Metadata.

Metadata files for most functions should be comma seperated files or table seperated files that could be imported into R by data.table. Only a few function has its own special metadata format (e.g. DATASET_ALIGNMENT)

Data input and output

Input

Metadata: It is recommended to use data.table::fread to read metadata

metadata_path <-system.file("extdata","parameter_groups.txt",package = "itol.toolkit")
meta_data <- data.table::fread(metadata_path)

Tree: It is recommended to use ape::read.tree to read the tree file. If it is a non-phylo object tree calculated in the R environment, it is recommended to use ape::as.phylo to convert it to a phylo object.

tree_path <- system.file("extdata","tree_of_itol_templates.tree",package = "itol.toolkit")
tree <- read.tree(tree_path)

Output

Users will get unit or hub object after itol.toolkit annotation. You can use write_unit to output a single unit object, and the output content will be the template file corresponding to the object’s type attribute. Also, you can add multiple units to the hub object, and then use write_hub to output the template file in bulk.

Dataset

To fully demonstrate itol.toolkit functions, we presented five datasets:

iTOL parameter structure and usage. This is a very small dataset, mainly used for the most basic function demonstration.
Li_2022_JHazardMater. This is a multi group dataset that focuses on text display, mainly used for displaying TEXT functions and palette systems.
Zheng_2022_EnvironPollut. This includes a small phylogenetic tree, a large phylogenetic tree, and a comprehensive dataset for multi-dimensional data comparison. It is mainly used for palette display, browser overload testing, and demonstrate the comprehensive use of multiple functions.
Zhou_2022_FoodRes. This is a dataset mainly used for comparing with table2itol.
Tree of life. This is an official standard dataset from iTOL, mainly used for demonstrating how to extract and combine metadata from existing templates.

Dataset 1

Name	Import inside R	GitHub url	Note
Tree file	`system.file("extdata","tree_of_itol_templates.tree",package = "itol.toolkit")`	itol_templates.tree	Newick format，branches are 23 template names，nodes are 5 template types
Metadata1 raw parameter table	`system.file("extdata","parameter_groups.txt",package = "itol.toolkit")`	parameter_usage_raw.txt	Tab seperated file，parameters are on row，template names are on column
Metadata2 parameter table	`data("template_parameters_count")`	parameter_usage_count.txt	Tab seperated file，parameters are on row，template names are on column，converted from parameter_usage_raw.txt
Metadata3 frequency table	`system.file("extdata","templates_frequence.txt",package = "itol.toolkit")`	templates_frequence.txt	Tab seperated file，template names are on row，DOIs are on column
Metadata4 classification	`data("template_groups")`	template_groups.txt	Tab seperated file，the first column is template name，the second column is template type

Dataset 2

Name	GitHub url	Note
Tree file	tree.nwk	Newick format，branches are 104 OTU IDs
Metadata	metadata.txt	Tab seperated file, the first column is sequence id，the second column is unused data，the third column is gene name，the fourth column is substrate，the fifth column is species name

Dataset 3

Name	GitHub url	Note
Tree file1 Subtree	abunt-tree.nwk	Newick format，branches are 39 OTU IDs
Metadata1 Species and their abundance	abunt-metadata.txt	Tab seperated file, the first column is OTU ID, the second column is NS group abundance, the third column is OS group abundance, the fourth column is phylum, the fifth column is class, the sixth column is species, the sixth column is indicators with differences
Tree file2 Whole tree	rare-tree.nwk	Newick format，branches are 1338 OTU IDs
Metadata2 Species and their abundance	rare-metadata.txt	Tab seperated file, the first column is OTU ID, the second column is NS group abundance, the third column is OS group abundance, the fourth column is phylum, the fifth column is class, the sixth column is species, the sixth column is indicators with differences
Tree file3 iCAMP bin tree	assembly-tree.nwk	Newick format，branches are bins
Metadata3 iCAMP result	assembly-metadata.txt	Tab seperated file, the first column is bin ID, others are iCAMP output results

Dataset 4

Name	GitHub url	Note
Tree file	otus.contree	Newick format，branches are 186 ASV IDs
Metadata	annotation.txt	Tab seperated file, the first column is ASV id，the 2-8 columns are taxonomy name in different level，the 9th column is total abundance，the 10-12 columns are abundance in different sample

Dataset 5

See iTOL offical example data here.

CACMS, njbxhzy@hotmail.com ↩︎
IOCAS, tongzhou2017@gmail.com ↩︎

Zhongyi Hua¹, Tong Zhou²

Last compiled on 05 December, 2024

Dataset structure

Data input and output

Input

Output

Dataset

Dataset 1

Dataset 2

Dataset 3

Dataset 4

Dataset 5

Datasets

Zhongyi Hua1, Tong Zhou2

Last compiled on 05 December, 2024

Dataset structure

Data input and output

Input

Output

Dataset

Dataset 1

Dataset 2

Dataset 3

Dataset 4

Dataset 5

Zhongyi Hua¹, Tong Zhou²