gpt4 book ai didi

r - 使用带有 grepl 和循环的名称列表从字符串中提取名称,并将它们添加到 R 中的新列

转载 作者:行者123 更新时间:2023-12-04 12:55:16 24 4
gpt4 key购买 nike

我有一个数据集,其中一列包含姓名,一列指示该人白天做了什么。我正在尝试使用 R 找出那天在我的数据集中谁会见了谁。我创建了一个包含数据集中名称的向量,并在循环中使用 grepl 来确定名称出现在详细说明人们事件的列中的位置在数据集中。

name <- c("Dupont","Dupuy","Smith") 

activity <- c("On that day, he had lunch with Dupuy in London.",
"She had lunch with Dupont and then went to Brighton to meet Smith.",
"Smith remembers that he was tired on that day.")

met_with <- c("Dupont","Dupuy","Smith")

df<-data.frame(name, activity, met_with=NA)


for (i in 1:length(met_with)) {
df$met_with<-ifelse(grepl(met_with[i], df$activity), met_with[i], df$met_with)
}
然而,由于两个原因,该解决方案并不令人满意。当这个人遇到一个以上的人时,我不能提取一个以上的名字(在我的例子中是 Dupuy),我不能告诉 R 在我的名字中使用这个名字而不是代词时不要返回这个人的名字事件列(例如史密斯)。
理想情况下,我希望 df 看起来像:
  name         activity                                            met_with                             
Dupont On that day, he had lunch with Dupuy in London. Dupuy
Dupuy She had lunch with Dupont and then (...). Dupont Smith
Smith Smith remembers that he was tired on that day. NA
我正在清理字符串以构建边缘列表和节点列表,以便稍后进行网络分析。
谢谢

最佳答案

与@Gki 相同的逻辑,但使用 stringr函数和 mapply而不是循环。

library(stringr)

pat <- str_c('\\b', df$name, '\\b', collapse = '|')
df$met_with <- mapply(function(x, y) str_c(setdiff(x, y), collapse = ' '),
str_extract_all(df$activity, pat), df$name)

df

# name activity
#1 Dupont On that day, he had lunch with Dupuy in London.
#2 Dupuy She had lunch with Dupont and then went to Brighton to meet Smith.
#3 Smith Smith remembers that he was tired on that day.

# met_with
#1 Dupuy
#2 Dupont Smith
#3

关于r - 使用带有 grepl 和循环的名称列表从字符串中提取名称,并将它们添加到 R 中的新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68286245/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com