gpt4 book ai didi

r - 自动修改复杂字符向量

转载 作者:行者123 更新时间:2023-12-02 03:02:18 24 4
gpt4 key购买 nike

我有一个复杂的字符向量,向量的每个元素都由数字和字母组成。我想简化这个向量,以便将数字和/或字母序列放入范围内。这是一个示例,输入和输出向量应该是这样的:

# Input vector
input_vec <- c("1,2,3,4,5", "1,2,3,5,6,7,8", "2,3,4,5", "A,B,C", "1,2,3,4,5,A,B,8,9,10,11")

# Here some function should be applied, to create the desired output vector automatically

# Desired output vector
output_vec <- c("1-5", "1-3,5-8", "2-5", "A-C", "1-5,A-B,8-11")

我确信必须有一种方法来构建函数或使用包,以自动化方式执行此操作,但不幸的是,我正在努力寻找解决方案。非常感谢任何帮助!

更新:添加了一个更复杂的向量

input_vec2 <- c("1,2,3,4,5", "1,2,3,5,6,7,8", "2,3,4,5", "A,B,C", "1,2,3,4,5,A,B,8,9,10,11", 
"1", "1,2,3,-4", "lala,3") # This part is new

output_vec2 <- c("1-5", "1-3,5-8", "2-5", "A-C", "1-5,A-B,8-11",
"1", "1-3,-4", "lala,3") # This part is new

最佳答案

这可能仍然有点臃肿,但我试图将问题分解为更小的函数。他们来了。首先是一些通用的辅助函数

# Is value numeric?
is_numeric <- function(x) suppressWarnings(!is.na(as.numeric(x)))
# Greate IDs for sequences of values using run-length encoding
rleg <- function(x) {
r <- rle(x);
val <- list(group_value = r$values)
r$values <- seq_along(r$values);
val$group_id <- inverse.rle(r)
val
}

现在有一些更具体的问题助手

collapse_sequence <- function(x) {
if (length(x)>1) {
paste0(x[1],"-", x[length(x)])
} else {
x
}
}

find_runs <- function(x, key=x) {
nona <- function(x) {x[is.na(x)]<-0; x}
run <- cumsum(nona(c(1,diff(key)))!=1)
Map(collapse_sequence, split(x, run))
}

collapse_numeric <- function(x) {
paste(sapply(find_runs(x, as.numeric(x)), collapse_sequence), collapse=",")
}

collapse_character <- function(x) {
key <- sapply(x, function(z) ifelse(nchar(z)==1, utf8ToInt(z), NA))
paste(sapply(find_runs(x, key), collapse_sequence), collapse=",")
}

collapse_runs <- Vectorize(function(x) {
x <- strsplit(x, ",")[[1]]
type <- ifelse(is_numeric(x), 1, ifelse(nchar(x)==1, 2, 3))
group <- rleg(type)
runs <- Map(function(v, type) {
if(type==1) {
collapse_numeric(v)
} else {
collapse_character(v)
}
},split(x, group$group_id), group$group_value)
paste(runs, collapse=",")
})

最后我们用你的输入来测试它

input_vec <- c("1,2,3,4,5", "1,2,3,5,6,7,8", "2,3,4,5", "A,B,C", "1,2,3,4,5,A,B,8,9,10,11")
unname(collapse_runs(input_vec))
# [1] "1-5" "1-3,5-8" "2-5" "A-C" "1-5,A-B,8-11"
input_vec2 <- c("1,2,3,4,5", "1,2,3,5,6,7,8", "2,3,4,5", "A,B,C", "1,2,3,4,5,A,B,8,9,10,11", "1",
"1,2,3,-4", "lala,3")
unname(collapse_runs(input_vec2))
# [1] "1-5" "1-3,5-8" "2-5" "A-C" "1-5,A-B,8-11"
# [6] "1" "1-3,-4" "lala,3"

关于r - 自动修改复杂字符向量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44951131/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com