gpt4 book ai didi

r - 更改多个可选子字符串的顺序

转载 作者:行者123 更新时间:2023-12-02 09:02:41 24 4
gpt4 key购买 nike

这是一个 bit like this question ,但我有多个可能出现也可能不出现的子字符串。

两个不同维度的子字符串代码,在我的示例中为“test”和“eye”。它们可以以任何可以想象的顺序发生。变量可以用不同的方式编码 - 在我的示例中,“method|test”将是“test”的两种编码方式,以及“r|re|l|le”为眼睛编码的不同方式。

我发现了一个复杂的解决方案,它使用了七个(!)gsub调用链,我想知道是否有更简洁的方法。

x <- c("id", "r_test", "l_method", "test_re", "method_le", "test_r_old", 
"test_l_old", "re_test_new","new_le_method", "new_r_test")
x
#> [1] "id" "r_test" "l_method" "test_re"
#> [5] "method_le" "test_r_old" "test_l_old" "re_test_new"
#> [9] "new_le_method" "new_r_test"

期望的输出

#>  [1] "id"         "r_test"     "l_test"     "r_test"     "l_test"    
#> [6] "r_test_old" "l_test_old" "r_test_new" "l_test_new" "r_test_new"

我是如何到达那里的(令人费解)

## Unify codes for variables, I use the underscores to make it more unique for future regex 
clean_test<- gsub("(?<![a-z])(test|method)(?![a-z])", "_test_", tolower(x), perl = TRUE)
clean_r <- gsub("(?<![a-z])(r|re)(?![a-z])", "_r_", tolower(clean_test), perl = TRUE)
clean_l <- gsub("(?<![a-z])(l|le)(?![a-z])", "_l_", tolower(clean_r), perl = TRUE)

## Now sort, one after the other
sort_eye <- gsub("(.*)(_r_|_l_)(.*)", "\\2\\1\\3", clean_l, perl = TRUE)
sort_test <- gsub("(_r_|_l_)(.*)(_test_)(.*)", "\\1\\3\\2\\4", sort_eye, perl = TRUE)

## Remove underscores
clean_underscore_mult <- gsub("_{2,}", "_", sort_test)
clean_underscore_ends <- gsub("^_|_$", "", clean_underscore_mult)

clean_underscore_ends
#> [1] "id" "r_test" "l_test" "r_test" "l_test"
#> [6] "r_test_old" "l_test_old" "r_test_new" "l_test_new" "r_test_new"

如果有人建议我如何更好地从 ## 现在排序,一个接一个 向下...

最佳答案

对字符串进行标记并使用查找表怎么样?我将使用 data.table 来提供帮助,但这个想法也自然适合其他数据语法

library(data.table)
# build into a table, keeping track of an ID
# to say which element it came from originally
l = strsplit(x, '_', fixed=TRUE)
DT = data.table(id = rep(seq_along(l), lengths(l)), token = unlist(l))

现在构建一个查找表:

# defined using fread to make it easier to see
# token & match side-by-side; only define tokens
# that actually need to be changed here
lookups = fread('
token,match
le,l
re,r
method,test
')

现在合并:

# default value is the token itself
DT[ , match := token]
# replace anything matched
DT[lookups, match := i.match, on = 'token']

接下来使用factor排序来以正确的顺序获取 token :

# the more general [where you don't have an exact list of all the possible
# tokens ready at hand] is a bit messier -- you might do something
# similar to setdiff(unique(match), lookups$match)
DT[ , match := factor(match, levels = c('id', 'r', 'l', 'test', 'old', 'new'))]
# sort to this new order
setorder(DT, id, match)

最后再次组合(聚合)以获得输出:

DT[ , paste(match, collapse='_'), by = id]$V1
# [1] "id" "r_test" "l_test" "r_test" "l_test"
# [6] "r_test_old" "l_test_old" "r_test_new" "l_test_new" "r_test_new"

关于r - 更改多个可选子字符串的顺序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62103640/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com