gpt4 book ai didi

用R中的部分匹配替换整个单词或单词

转载 作者:行者123 更新时间:2023-12-05 06:18:23 25 4
gpt4 key购买 nike

我有一个包含数千个拼写错误的城市名称的数据框。我需要更正这些并且无法找到解决方案,尽管我已经广泛搜索了。我尝试了几种功能和方法

这是数据的微型样本:

citA <- data.frame("num" = c(1,2,3,4,5,6,7,8),
"city" = c("BORNE","BOERNAE","BARNE","BOERNE",
"GALDEN","GELDON","GOELDEN","GOLDEN"))

num city
1 1 BORNE
2 2 BOERNAE
3 3 BARNE
4 4 BOERNE
5 5 GALDEN
6 6 GELDON
7 7 GOELDEN
8 8 GOLDEN

这些是我尝试过的一些函数,还尝试了更多,包括 str_replace 和 str_detect:

cit <- function(x){
ifelse(x %in% grepl(c("BOR","BOE","BAR")),"BOERNE",
ifelse(x %in% grepl(c("GAL","GEL","GOE")), "GOLDEN", "OTHER"))
}

或者

cit <- function(x){
ifelse(x %in% c("BOR","BOE","BAR"),"BOERNE",
ifelse(x %in% c("GAL","GEL","GOE"), "GOLDEN", "OTHER"))
}

运行代码:

`citA$city2 <- cit(citA$city)`

错误的结果:

  num    city city2
1 1 BOERNE OTHER
2 2 BOERNAE OTHER
3 3 BARNE OTHER
4 4 BOERNE OTHER
5 5 GALDEN OTHER
6 6 GELDON OTHER
7 7 GOELDEN OTHER
8 8 GOLDEN OTHER

还试过:

citA$city[grepl(c("BOR","BOE","BAR"),citA$city)] <- "BOERNE" 

但这会引发错误:

Warning message:
In grepl(c("BOR", "BOE", "BAR"), citA$city) :
argument 'pattern' has length > 1 and only the first element will be used

您的想法会很有帮助!

最佳答案

如果你有很多这样的模式,你可以使用 dplyr 中的 case_when :

library(dplyr)
library(stringr)

citA %>%
mutate(city2 = case_when(str_detect(city, 'BOR|BOE|BAR') ~ 'BOERNE',
str_detect(city, 'GAL|GEL|GOE|GOL') ~ 'GOLDEN',
TRUE ~ 'OTHER'))

# num city city2
#1 1 BORNE BOERNE
#2 2 BOERNAE BOERNE
#3 3 BARNE BOERNE
#4 4 BOERNE BOERNE
#5 5 GALDEN GOLDEN
#6 6 GELDON GOLDEN
#7 7 GOELDEN GOLDEN
#8 8 GOLDEN GOLDEN

关于用R中的部分匹配替换整个单词或单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61297796/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com