Overview

Key updates in v1.1.9:

  1. Fix: Meaningless warning in the simplebar template.
  2. Fix: Clustering error when input is a single column in the heatmap.
  3. New: Support for outputting trees in the Hub.
  4. Fix: Bug when multiple heatmap units are used together.
  5. New: Color sorting algorithm, supports custom root node color.
  6. New: Two tree-building methods (based on string vectors and data frame row/column names).

Document Format Notes:

  1. The document will be converted to English in the future.
  2. Blockquotes contain error messages; code blocks contain either user-side code or source code, with line numbers in source code to differentiate them from user-side code.

Feature Details and Problem Analysis

1. Fixing the Warning in the Simplebar Template

Problem Description

When creating a DATASET_SIMPLEBAR unit with create_unit, for example:

library(itol.toolkit) # main package
library(dplyr) # data manipulation
library(data.table) # file read

tree_1 <- system.file("extdata","dataset4/otus.contree",package = "itol.toolkit")

data_file_1 <- system.file("extdata","dataset4/annotation.txt",package = "itol.toolkit")

data_1 <- data.table::fread(data_file_1)

# simple_bar by NS
unit_4 <- create_unit(data = data_1 %>% select(ID, Asia),
                      key = "rep_Zhou_4_multibar",
                      type = "DATASET_SIMPLEBAR",
                      tree = tree_1)
## [1] "DATASET_SIMPLEBAR"

In versions v1.1.8 and earlier, the following warning would be triggered:

Warning in mean.default(data) : argument is not numeric or logical: returning NA

The generated template file might contain something like this:

DATASET_SCALE 0.009113938 NA 4.09978

Problem Investigation

When using mean directly, since the selected data frame is not purely a numeric vector, the result cannot be calculated.

data_1 %>% select(Asia) %>% mean()
data_1 %>% select(North_America, Asia) %>% mean()
# Returns: 
# [1] NA
# Warning message:
# In mean.default(.) : argument is not numeric or logical: returning NA

Thus, the solution is to convert the data frame to a vector, then calculate mean by rows or columns, which will yield a numeric vector and resolve the issue:

data_1 %>% select(North_America, Asia) %>% rowMeans() %>% mean()
data_1 %>% select(North_America, Asia) %>% colMeans() %>% mean()
# [1] 0.2529557

# Also works for single-column data:
data_1 %>% select(Asia) %>% rowMeans() %>% mean()
# [1] 0.2657

To fix the issue, change the code at line 1364 in user.R to:

mean <- data %>% rowMeans() %>% mean()

Further Thoughts

In the gradient template, mean() does not trigger a similar issue because gradient does not require a scale value, and instead automatically generates the legend. In the future, a unified function could be designed to handle scale information for different templates.

2. Fixing the Clustering Error When Heatmap Input is a Single Column

Problem Description

While heatmaps typically take multi-column data as input, and single-column heatmaps can be replaced by the gradient, single-column heatmaps still offer more adjustable parameters. Hence, some users still use the single-column heatmap feature. However, when calling create_unit to create a unit, an error is thrown:

unit_6 <- create_unit(data = data_1 %>% select(ID, Asia),
                      key = "rep_Zhou_6_multibar",
                      type = "DATASET_HEATMAP",
                      tree = tree_1)
# Error in hclust(dist(t(data %>% select(field_names)))) : 
#   must have n >= 2 objects to cluster

Problem Investigation

When hclust is used for clustering, if the input data has only one column, clustering cannot be performed, leading to an error when generating the field tree. Therefore, a conditional check should be added near line 959 in user.R to skip field tree generation if the input is a single column:

if(length(names(data)) == 2){
  field_tree <- NULL
}else{
  field_tree <- write.tree(ape::as.phylo(hclust(dist(t(data %>% select(field_names))))))
}

This ensures that the template file is generated successfully without errors. However, when submitting the updated template to iTOL, the following message is displayed:

Leaf ‘undefined’ in the heatmap field tree is not present in the FIELD_LABELS.

Field label ‘Asia’ is not present in the heatmap field tree.

Failed to parse the FIELD_TREE. Make sure it is in Newick format.

This occurs because, when the field tree is empty, it automatically inherits the style template content. Since updating the logic for style template inheritance requires system-level review, the issue is temporarily resolved by skipping the field tree output when the input has a single column. The following changes are made in user.R at line 969 and in output.R at line 230:

# user.R L969
if(length(names(data)) == 2){
  specific_themes[["heatmap"]][["tree"]][["tree_display"]] <- 0
}else{
  specific_themes[["heatmap"]][["tree"]][["tree"]] <- field_tree
}

# output.R L230
if (!is.null(unit@specific_themes$heatmap$tree$tree)) {
  if(unit@specific_themes$heatmap$tree$tree_display == 1){
    lines <- c(lines, paste("FIELD_TREE", paste(unit@specific_themes$heatmap$tree$tree, collapse = unit@sep), sep = unit@sep))
  }
}

3. Fixing the Issue with Addition of Multiple Units

Problem Description

Due to the previous version’s limitation, write_hub only output template files in bulk without outputting tree files, which made it difficult for beginners to locate tree files within the package. To make it easier for users to obtain all files for uploading to iTOL at once, the write_hub function in v1.1.9 introduces the with_tree parameter. When with_tree = TRUE, tree files are also outputted. This parameter defaults to FALSE to maintain backward compatibility.

hub_1 <- create_hub(tree_1)
hub_1 <- hub_1 + unit_4 + unit_5 + unit_6
write_hub(hub_1, "~/Downloads/", with_tree = T)

However, in versions v1.1.8 and earlier, running this might cause an error:

# No error with the following:
hub_1 <- hub_1 + unit_4 #+ unit_5 + unit_6
hub_1 <- (hub_1 + unit_4) + unit_5 #+ unit_6

# Error with the following:
hub_1 <- ((hub_1 + unit_4) + unit_5) + unit_6
# Error in UseMethod("tbl_vars") : 
# no applicable method for 'tbl_vars' applied to an object of class "c('double', 'numeric')"

Problem Investigation

This error occurs because when a heatmap is merged into the hub, if the data format is data.table, it results in a format conversion error. To fix this, add a format conversion at line 1352 in learn.R:

abundance <- as.data.frame(abundance)

4. Color Sorting Algorithm

A color sorting algorithm is supported, allowing users to define custom sorting rules through the sort_color() method and specify a reference color for sorting. This feature is still being explored, and concrete examples will be provided in the future.

set.seed(123)
colors <- get_color(10,set = "wesanderson")
sort_color(colors,plot = T, root="#FF0000")

##  [1] "#D8B70A" "#A42820" "#FF0000" "#C27D38" "#FAD77B" "#02401B" "#EABE94"
##  [8] "#DD8D29" "#8D8680" "#3B9AB2"

5. New Tree-Building Methods

This feature is still in the exploration stage, and concrete examples will be provided later. Method 2 will be further optimized.

Method 1: Based on String Vector

Generate a tree directly from a template string:

data("template_groups")
plot(vector_tree(template_groups$template))

Method 2: Based on Data Frame Row/Column Names

Generate a tree based on the row and column names of a data frame:

data("template_parameters_count")
plot(template_parameters_count[c("TREE_COLORS","DATASET_STYLE","DATASET_COLORSTRIP"),1:5] %>% df_tree(main = "row"))