Skip to contents

Convert various data formats to binary format (-1, 0, 1) required by DATASET_BINARY. This function provides flexible data conversion options and can detect if data is already in the correct format.

Usage

convert_to_binary(
  data,
  force_convert = FALSE,
  verbose = TRUE,
  negative_range = NULL,
  zero_range = NULL,
  positive_range = NULL,
  lower_inclusive = TRUE,
  upper_inclusive = TRUE,
  auto_detect = TRUE
)

Arguments

data

A data frame with the first column as ID and subsequent columns as data to convert

force_convert

Logical, whether to force conversion even if data appears to be already in binary format

verbose

Logical, whether to print conversion messages

negative_range

Numeric vector of length 2, range for converting to -1 (e.g., c(0, 0.2))

zero_range

Numeric vector of length 2, range for converting to 0 (e.g., c(0.2, 0.8))

positive_range

Numeric vector of length 2, range for converting to 1 (e.g., c(0.8, 1))

lower_inclusive

Logical, whether middle range (zero_range) lower bound is inclusive (default: TRUE)

upper_inclusive

Logical, whether middle range (zero_range) upper bound is inclusive (default: TRUE)

auto_detect

Logical, whether to automatically detect data range and set ranges (default: TRUE)

Value

A data frame with data converted to binary format (-1, 0, 1)

Details

The function handles several data types with flexible range control:

  • Boolean/logical data: TRUE->1, FALSE->0

  • 0-1 range data: Custom ranges for -1, 0, 1 conversion

  • Non-negative data: Custom ranges for -1, 0, 1 conversion

  • Already binary data: No conversion if already in -1, 0, 1 format

Range parameters work as follows:

  • Values in negative_range -> -1

  • Values in zero_range -> 0 (with inclusive/exclusive boundary control)

  • Values in positive_range -> 1

  • Values outside all ranges -> NA (with warning)

The middle range (zero_range) boundary control:

  • If lower_inclusive=TRUE: lower bound is inclusive

  • If lower_inclusive=FALSE: lower bound is exclusive

  • If upper_inclusive=TRUE: upper bound is inclusive

  • If upper_inclusive=FALSE: upper bound is exclusive

This allows flexible scenarios like:

  • Standard: negative->-1, middle->0, positive->1

  • Inverted: negative->0, middle->-1, positive->1

  • Custom: any range mapping to -1, 0, 1

Examples

# Convert 0-1 range data with default ranges
data <- data.frame(ID = c("A", "B", "C"), 
                  Asia = c(0, 0.3, 0.8),
                  Europe = c(1, 0.7, 0.2))
convert_to_binary(data)
#> Converting data to binary format (-1, 0, 1)...
#>              Range  Asia Europe
#>             <char> <int>  <int>
#> 1:  [0, 0.5) -> -1     2      1
#> 2: [0.5, 0.5] -> 0     0      0
#> 3:   (0.5, 1] -> 1     1      2
#>   ID Asia Europe
#> 1  A   -1      1
#> 2  B   -1      1
#> 3  C    1     -1

# Convert with custom ranges and boundary control
convert_to_binary(data, 
                 negative_range = c(0, 0.2), 
                 zero_range = c(0.2, 0.8), 
                 positive_range = c(0.8, 1),
                 lower_inclusive = TRUE,   # [0.2, ...]
                 upper_inclusive = TRUE)   # [..., 0.8]
#> Converting data to binary format (-1, 0, 1)...
#>              Range  Asia Europe
#>             <char> <int>  <int>
#> 1:  [0, 0.2) -> -1     1      0
#> 2: [0.2, 0.8] -> 0     2      2
#> 3:   (0.8, 1] -> 1     0      1
#>   ID Asia Europe
#> 1  A   -1      1
#> 2  B    0      0
#> 3  C    0      0

# Convert boolean data
data_bool <- data.frame(ID = c("A", "B", "C"), 
                       Present = c(TRUE, FALSE, TRUE),
                       Absent = c(FALSE, TRUE, FALSE))
convert_to_binary(data_bool)
#> Converting data to binary format (-1, 0, 1)...
#>   ID Present Absent
#> 1  A       1      0
#> 2  B       0      1
#> 3  C       1      0