gpt4 book ai didi

r - 将字符串拆分为话语并将同一说话人的话语分配给数据帧中的列

转载 作者:行者123 更新时间:2023-12-02 02:32:12 25 4
gpt4 key购买 nike

我在这样的字符串中进行多方对话:

convers <- "Peter: Hiya Mary: Hi. How w'z your weekend. Peter: a::hh still got a headache. An' you (.) party a lot? Mary: nuh, you know my kid's sick 'n stuff Peter: yeah i know that's=erm al hamshi: hey guys how's it goin'? Peter: Great! Mary: where've you BEn last week al hamshi: ah y' know, camping with my girl friend."

我还有一个带有演讲者姓名的向量:

speakers <- c("Peter", "Mary", "al hamshi")

我想创建一个数据框,其中每个发言者的言论都在单独的列中。我只能以零敲碎打的方式完成这项任务,具体使用 speakers 中的索引来处理每个发言者,然后将单独的结果合并到一个列表中,但我真正想要的是 < em>数据框每个发言者都有单独的列:

Peter <- str_extract_all(convers, paste0("(?<=", speakers[1],":\\s).*?(?=\\s*(?:", paste(speakers, collapse="|"),"):|\\z)"))
Mary <- str_extract_all(convers, paste0("(?<=", speakers[2],":\\s).*?(?=\\s*(?:", paste(speakers, collapse="|"),"):|\\z)"))
al_hamshi <- str_extract_all(convers, paste0("(?<=", speakers[3],":\\s).*?(?=\\s*(?:", paste(speakers, collapse="|"),"):|\\z)"))

df <- list(
Peter = Peter, Mary = Mary , al_hamshi = al_hamshi
)
df
$Peter
$Peter[[1]]
[1] "Hiya" "a::hh still got a headache. An' you (.) party a lot?"
[3] "yeah i know that's=erm" "Great!"


$Mary
$Mary[[1]]
[1] "Hi. How w'z your weekend." "nuh, you know my kid's sick 'n stuff" "where've you BEn last week"


$al_hamshi
$al_hamshi[[1]]
[1] "hey guys how's it goin'?" "ah y' know, camping with my girl friend."

如何才能提取同一个说话者的话语,而不是一一提取,而是一次提取,以及如何将结果分配给数据帧而不是列表?

最佳答案

通过一些预处理,并假设名称与对话文本中的说话者完全匹配,您可以执行以下操作:

# Pattern to use to insert new lines in string
pattern <- paste0("(", paste0(speakers, ":", collapse = "|"), ")")

# Split string by newlines
split_conv <- strsplit(gsub(pattern, "\n\\1", convers), "\n")[[1]][-1]

# Capture speaker and text into data frame
dat <- strcapture("(.*?):(.*)", split_conv, data.frame(speaker = character(), text = character()))

这给出:

    speaker                                                   text
1 Peter Hiya
2 Mary Hi. How w'z your weekend.
3 Peter a::hh still got a headache. An' you (.) party a lot?
4 Mary nuh, you know my kid's sick 'n stuff
5 Peter yeah i know that's=erm
6 al hamshi hey guys how's it goin'?
7 Peter Great!
8 Mary where've you BEn last week
9 al hamshi ah y' know, camping with my girl friend.

让每个发言者进入自己的专栏:

# Count lines by speaker
dat$cnt <- with(dat, ave(speaker, speaker, FUN = seq_along))

# Reshape and rename
dat <- reshape(dat, idvar = "cnt", timevar = "speaker", direction = "wide")
names(dat) <- sub("text\\.", "", names(dat))

cnt Peter Mary al hamshi
1 1 Hiya Hi. How w'z your weekend. hey guys how's it goin'?
3 2 a::hh still got a headache. An' you (.) party a lot? nuh, you know my kid's sick 'n stuff ah y' know, camping with my girl friend.
5 3 yeah i know that's=erm where've you BEn last week <NA>
7 4 Great! <NA> <NA>

如果文本中已存在新行,请选择另一个不存在的字符来分割字符串。

关于r - 将字符串拆分为话语并将同一说话人的话语分配给数据帧中的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64892309/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com