gpt4 book ai didi

r - 如何使用正则表达式和 R 提取匹配项上方的行?

转载 作者:行者123 更新时间:2023-12-04 14:38:28 25 4
gpt4 key购买 nike

我想使用 R 匹配一些特定的字符串并只保留匹配项上方的行,这是一些示例数据。有一个包含数百个类似案例的文件:

first_case<- data.frame(line = 

c("#John Wayne: Su, 11.01.2013 08:24:42#
He is present / I guess, Does great job
--------------------------------------------------
#Michal Thorn: Fr, 12.09.2015 17:23:01#
Works quite frequently with people
--------------------------------------------------
#Sandra Nunes: Mo, 20.05.2011 09:00:29#
She has some new clients"))



second_case<- data.frame(line =

c("#Boris Jonson: Mo, 30.09.2017 09:20:42#
He is present
--------------------------------------------------
#Jacky Fine: Th, 02.02.2013 18:23:01#
Does great job
--------------------------------------------------
#Michael Bissping: Mo, 25.03.2012 10:00:29#
Hard to count on"))



third_case<- data.frame(line =

c("#Isabelle Warren: Sa, 02.12.2013 02:24:42#
Not around / anymore
--------------------------------------------------
#Tobias Maker: Mo, 02.03.2013 10:23:01#
Works quite frequently with people
--------------------------------------------------
#Toe Michael : Mo, 20.05.2011 09:00:29#
She has some new clients & Does great job"))

all_cases <- rbind(first_case,second_case,third_case)

在这里,我尝试过滤上面 1 行的那些行:
Does great job
通过查看是否 Does great job以新行结束并取上面的第一行:
dplyr::filter(all_cases, grepl("((.*\n){1})Does great job",line))

预期成绩:
first_case<- data.frame(line = 
c("#John Wayne: Su, 11.01.2013 08:24:42#"))
second_case<- data.frame(line =
c("#Jacky Fine: Th, 02.02.2013 18:23:01#"))
third_case<- data.frame(line =
c("#Toe Michael : Mo, 20.05.2011 09:00:29#"))

expected_result <- rbind(first_case,second_case,third_case)

1 #John Wayne: Su, 11.01.2013 08:24:42#
2 #Jacky Fine: Th, 02.02.2013 18:23:01#
3 #Toe Michael : Mo, 20.05.2011 09:00:29#

不幸的是,这将返回零行。欣赏任何见解!

最佳答案

这是使用 strsplit 的一种基本 R 方法.我们可以形成一个行的列表/向量,然后直接使用grep找到匹配 Does great job 的行的索引.然后,只需返回紧随其后的行。

line <- "#Boris Jonson: Mo, 30.09.2017 09:20:42#
He is present
--------------------------------------------------
#Jacky Fine: Th, 02.02.2013 18:23:01#
Does great job
--------------------------------------------------
#Michael Bissping: Mo, 25.03.2012 10:00:29#
Hard to count on"

terms <- unlist(strsplit(line, "\n"))
terms[grep("Does great job", terms) - 1]

[1] " #Jacky Fine: Th, 02.02.2013 18:23:01#"

Demo

我的回答没有涵盖许多边缘情况,第一个是匹配逻辑。如果搜索词匹配多次或根本不匹配,会发生什么情况?另外, grep 中使用的模式应该如何具体?是?

关于r - 如何使用正则表达式和 R 提取匹配项上方的行?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51651953/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com