gpt4 book ai didi

regex - 通过 data.table 循环 grepl() (R)

转载 作者:行者123 更新时间:2023-12-01 16:51:20 27 4
gpt4 key购买 nike

我有一个数据集存储为 data.table DT看起来像这样:

print(DT)
category industry
1: administration admin
2: nurse practitioner truck
3: trucking truck
4: administration admin
5: warehousing nurse
6: warehousing admin
7: trucking truck
8: nurse practitioner nurse
9: nurse practitioner truck

我想将表格缩减为仅包含行业与类别匹配的行。我的一般方法是使用 grepl()正则表达式匹配字符串 '^{{INDUSTRY}}[a-z ]+$'每行 DT$category ,每个对应行 DT$industry插入 {{INDUSTRY}} 的位置在正则表达式字符串中使用 infuse() 。我很难找到一个时尚的 data.table 解决方案,可以正确地循环遍历表并进行行内比较,因此我求助于 for 循环来完成工作:

template <- "^{{IND}}[a-z ]+$"
DT[,match := FALSE,]
for (i in seq(1,length(DT$category))) {
ind <- DT[i]$industry
categ <- d.daily[i]$category
if (grepl(infuse(IND=ind,template),categ)){
DT[i]$match <- TRUE
}
}
DT<- DT[match==TRUE]
print(DT)
category industry
1: administration admin
2: trucking truck
3: administration admin
4: trucking truck
5: nurse practitioner nurse

但是,我确信可以通过更好的方式来完成此操作。关于如何利用 data.table 包的功能实现此结果有什么建议吗?据我了解,在这种情况下,使用该包的方法可能比 for 循环更有效。

最佳答案

您可以使用stringi::stri_detect_fixed()。它通过 strpattern 进行矢量化。

DT[stringi::stri_detect_fixed(category, industry)]
# category industry
# 1: administration admin
# 2: trucking truck
# 3: administration admin
# 4: trucking truck
# 5: nurse practitioner nurse

或者,可以使用stringr::str_detect()。它还对其两个参数进行向量化。

library(stringr)
DT[str_detect(category, fixed(industry))]

或者一个基本的 R 选项是通过 mapply() 运行 grepl()

DT[mapply(grepl, industry, category, fixed = TRUE)]

或者您可以使用 Vectorize(grepl) 获得相同的结果。

DT[Vectorize(grepl)(industry, category, fixed = TRUE)]

所有这些都会产生相同的结果。

数据:

DT <- structure(list(category = c("administration", "nurse practitioner", 
"trucking", "administration", "warehousing", "warehousing", "trucking",
"nurse practitioner", "nurse practitioner"), industry = c("admin",
"truck", "truck", "admin", "nurse", "admin", "truck", "nurse",
"truck")), .Names = c("category", "industry"), class = "data.frame", row.names = c(NA,
-9L))
setDT(DT)

关于regex - 通过 data.table 循环 grepl() (R),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33699122/

27 4 0