gpt4 book ai didi

r - 如何根据数据框中的位置用数字替换字符串?

转载 作者:行者123 更新时间:2023-12-04 19:29:04 25 4
gpt4 key购买 nike

我有一个字符串向量,格式如下:

strings <- c("UUDBK", "KUVEB", "YVCYE")

我也有这样的数据框:
replacewith <- c(8, 4, 2)
searchhere <- c("UUDBK, YVCYE, KUYVE, IHVYV, IYVEK", "KUVEB, UGEVB", "KUEBN, IHBEJ, KHUDN")
dataframe <- data.frame(replacewith, searchhere)

我希望将字符串向量替换为此数据框中相应的“replacewith”列中的值。目前我这样做的方式是:
final <- sapply(as.character(strings), function(x)
as.numeric(dataframe[grep(x, dataframe$searchhere), 1]))

但是,使用长度为 10^9 的字符向量执行此操作的计算量非常大。

有什么更好的方法来做到这一点?

谢谢!

最佳答案

类似于@AntoniosK 的想法,这改为使用 hashmap将字符串映射到它们的值。 hashmapRcpp 实现在内部,所以它非常快:

library(hashmap)
library(tidyr)

search_replace = separate_rows(dataframe, searchhere)

search_hash = hashmap(search_replace[,2], search_replace[,1])

search_hash[[strings]]

结果:
> search_hash
## (character) => (numeric)
## [KHUDN] => [+2.000000]
## [KUEBN] => [+2.000000]
## [UGEVB] => [+4.000000]
## [KUVEB] => [+4.000000]
## [IYVEK] => [+8.000000]
## [IHVYV] => [+8.000000]
## [...] => [...]

> search_hash[[strings]]
[1] 8 4 8

基准:
> OP_func = function(){sapply(as.character(strings), function(x)
as.numeric(dataframe[grep(x,dataframe$searchhere), 1]))}

Unit: microseconds
expr min lq mean median uq max neval
OP_func() 121.191 124.9410 190.36472 129.8760 151.193 3370.047 100
d[d$searchhere %in% strings, ] 36.714 40.6605 52.85093 43.8185 61.583 147.246 100
search_hash[[strings]] 14.212 18.1590 25.05212 21.5150 29.608 58.820 100

另请注意,如果 strings 中有重复项,@AntoniosK 的解决方案将不起作用, 而 hashmap将为正确位置的每个元素返回正确的映射。

示例:
> strings_large = sample(search_replace$searchhere, 100, replace = TRUE)
> strings_large
[1] "YVCYE" "KUVEB" "KUYVE" "KHUDN" "KUYVE" "KHUDN" "KUEBN" "UUDBK" "KHUDN" "YVCYE" "IYVEK"
[12] "KUEBN" "KHUDN" "IHBEJ" "YVCYE" "KHUDN" "KUEBN" "UGEVB" "UUDBK" "KUYVE" "KHUDN" "IHBEJ"
[23] "IHVYV" "KUVEB" "IYVEK" "KHUDN" "KHUDN" "KUYVE" "YVCYE" "UUDBK" "KUYVE" "IHVYV" "KUYVE"
[34] "KUEBN" "KUYVE" "UUDBK" "KUYVE" "KUVEB" "KUVEB" "YVCYE" "KUYVE" "KHUDN" "KUVEB" "YVCYE"
[45] "IHBEJ" "YVCYE" "KHUDN" "UUDBK" "KUEBN" "IYVEK" "IHVYV" "UUDBK" "KUYVE" "KUEBN" "YVCYE"
[56] "UGEVB" "YVCYE" "KUYVE" "IHVYV" "KUEBN" "IHVYV" "IHBEJ" "KUVEB" "IHVYV" "KUYVE" "KUEBN"
[67] "IYVEK" "KUVEB" "KUEBN" "UGEVB" "KUEBN" "KUVEB" "IHBEJ" "KUYVE" "YVCYE" "YVCYE" "IHVYV"
[78] "YVCYE" "KHUDN" "KHUDN" "YVCYE" "IYVEK" "KUYVE" "KHUDN" "UGEVB" "YVCYE" "IHVYV" "KUVEB"
[89] "IYVEK" "KUEBN" "UGEVB" "UUDBK" "IYVEK" "IHBEJ" "IHBEJ" "UUDBK" "KUVEB" "UGEVB" "IYVEK"
[100] "IYVEK"

> search_hash[[strings_large]]
[1] 8 4 8 2 8 2 2 8 2 8 8 2 2 2 8 2 2 4 8 8 2 2 8 4 8 2 2 8 8 8 8 8 8 2 8 8 8 4 4 8 8 2 4 8
[45] 2 8 2 8 2 8 8 8 8 2 8 4 8 8 8 2 8 2 4 8 8 2 8 4 2 4 2 4 2 8 8 8 8 8 2 2 8 8 8 2 4 8 8 4
[89] 8 2 4 8 8 2 2 8 4 4 8 8

关于r - 如何根据数据框中的位置用数字替换字符串?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47211896/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com