gpt4 book ai didi

regex - R 正则表达式 : extracting speaker in a script

转载 作者:行者123 更新时间:2023-12-04 20:45:41 26 4
gpt4 key购买 nike

我想使用 R 从脚本中提取说话者,格式如下例所示:

“场景 6:第二领主:不,我的大人,让他去吧;让他为所欲为。第一领主:如果大人发现他不是一个隐藏的人,请不要再尊重我。第二领主在我的生命中,我的大人,一个泡沫。伯特伦:你认为我到目前为止被他欺骗了吗?第二个大人:相信它,我的大人,根据我自己的直接知识,没有任何恶意,但说他是我的亲戚,他是一个最著名的胆小鬼,一个无穷无尽的骗子,一个时常背信弃义的人,没有一个值得大人款待的好品质的拥有者。”

在这个例子中,我想提取:("Second Lord", "First Lord", "Second Lord", "BERTRAM", "Second Lord")。规则很明显:它是位于句末和半列之间的词组。

我怎样才能用 R 写这个?

最佳答案

也许是这样的:

library(stringr)  
body <- "Scene 6: Second Lord: Nay, good my lord, put him to't; let him have his way. First Lord: If your lordship find him not a hilding, hold me no more in your respect. Second Lord: On my life, my lord, a bubble. BERTRAM: Do you think I am so far deceived in him? Second Lord: Believe it, my lord, in mine own direct knowledge, without any malice, but to speak of him as my kinsman, he's a most notable coward, an infinite and endless liar, an hourly promise-breaker, the owner of no one good quality worthy your lordship's entertainment."
p <- str_extract_all(body, "[:.?] [A-z ]*:")

# and get rid of extra signs
p <- str_replace_all(p[[1]], "[?:.]", "")
# strip white spaces
p <- str_trim(p)
p
"Second Lord" "First Lord" "Second Lord" "BERTRAM" "Second Lord"

# unique players
unique(p)
[1] "Second Lord" "First Lord" "BERTRAM"

正则表达式的解释:(不完美)

str_extract_all(body, "[:.?] [A-z ]*:") 匹配以 : 开始。? ([:.?]) 后跟一个空格。匹配任何字符和空格,直到下一个 :

获取位置

您可以将 str_locate_all 与相同的正则表达式一起使用:

str_locate_all(body, "[:.?] [A-z ]*:")

关于regex - R 正则表达式 : extracting speaker in a script,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11358100/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com