r - 从文本单元格中提取围绕关键字的多个句子-6ren

r - 从文本单元格中提取围绕关键字的多个句子

转载作者：行者123 更新时间：2023-12-02 18:53:36

27

4

我正在尝试在 R 中搜索大文本中的关键字。一旦找到一个，我想提取该关键字前后的 1 个句子(包括其中包含该关键字的句子)。理想情况下，我希望能够更改此代码以围绕关键字提取最多 3 个句子。下面是示例数据。

text <- "This is an article about random things. Usually, there are a few sentences that are irrelevant to what I am interested in. Then in the middle, there is a sentence that I want to extract. Water quality is a serious concern in Akron, Ohio. It can impact ecological systems and human health. Jon Doe is a key player in this realm. Then the article goes on talking about something else that I don't care about."

keywords <- c("water quality", "health")

因此，对于上面的文本，我想在文本中搜索“水质”和“健康”，当有匹配时，我想从“然后中间有......”提取到“乔恩” Doe 是这个领域的关键参与者。”

最后，我想在多行上重复此操作，每行都有自己的文本。

我已经研究过使用 stringr/regex 但它没有给我我想要的东西 - 我无法提取完整的句子。有什么想法吗？

我尝试过的代码:

str_extract_all(text,paste0("([^\\s+\\s){5}",keywords,"(\\s[^\\s]+){5}"))

-> 这让我两边都说几句话

gsub(".*?([^\\.]*('water quality'|health)[^\\.]*).*","\\1", text, ignore.case = TRUE)

-> 也关闭

最佳答案

使用关键字创建要查找的模式，将数据放入小标题中，将它们分成句子(按句点分割)并选择n-1，对于找到模式的每 n 行，有 n 和 n+1 行。

library(dplyr)
library(tidyr)

keywords <- c("water quality", "health")
pat <- paste0(keywords, collapse = '|')
pat
#[1] "water quality|health"

tibble(text) %>%
  separate_rows(text, sep = '\\.\\s*') %>%
  slice({
    tmp <- grep(pat, text, ignore.case = TRUE)
    sort(unique(c(tmp-1, tmp, tmp + 1)))
  })

#  text                                                          
#  <chr>                                                         
#1 Then in the middle, there is a sentence that I want to extract
#2 Water quality is a serious concern in Akron, Ohio             
#3 It can impact ecological systems and human health             
#4 Jon Doe is a key player in this realm

关于r - 从文本单元格中提取围绕关键字的多个句子，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66449007/

27

4

0

文章推荐： three.js - X和Z轴的ThreeJS旋转做相同的旋转但方向不同

文章推荐： python - hist2d 中的 vmin 和 vmax

文章推荐： java - 尝试在 Quarkus 中进行 REST 调用时出错

文章推荐： javascript - 从浏览器运行 html 中的 javascript

html - 影响每第三个元素/格
我已经尝试在我的 CSS 中添加一个元素来删除每三个 div 的 margin-right。不过，似乎只是出于某种原因影响了第 3 次和第 7 次。需要它在第 3、6、9 等日工作... CSS .s
animation - CSS3 格 'pulse'
如何使 div/input 闪烁或“脉冲”？例如，假设表单字段输入了无效值？最佳答案使用 CSS3 类似 on this page ，您可以将脉冲效果添加到名为 error 的类中: @-webk
R 格 : Removing the wireframe mesh
我目前正在尝试构建一个简单的 wireframe来自 lattice 的情节包，但由沿 y 轴的数百个点组成。这导致绘图被线框网格淹没，您看到的只是一个黑色块。我知道我可以用 col=FALSE 完全
CSS 格[编号| ="subPane"]错误
在知道 parent>div CSS 选择器在 IE 中无法识别后，我重新编码我的 CSS 样式，例如: div#bodyMain div#paneLeft>div{/*styles here*/}
html - CSS 格 :hover exclusion
我有两个 div，一个在另一个里面。当我将鼠标悬停到最外面的那个时，我想改变它的颜色，没问题。但是，当我将鼠标悬停到内部时，我只想更改它的颜色。这可能吗？换句话说，当将鼠标悬停到内部 div 上
flutter - 格 subview 构建器，其项目的随机宽度 flutter
我需要展示这样的东西有人可以帮忙吗？我可以实现以下输出我正在使用以下代码:: GridView.builder( scrollDirection: Axis.vertical,
android - 复制键盘布局，向上插入 Bottom Sheet 格
当 Bottom Sheet 像 Android 键盘一样打开时，是否有任何方法可以手动上推布局( ScrollView 或回收器 View 或整个 Activity )？或者你可以说我想以 Bott
css - Safari 4 格 :hover Support
我有以下代码，用于使用纯 HTML 和 CSS 显示翻转。当您将鼠标悬停在文本上时，它会更改左右图像。在我测试的所有浏览器中都运行良好，Safari 4 除外。据我收集的信息，Safari 4 支持
html - 带 Bootstrap 的图像，打破列/格
我构建了某种 CMS，但在使用 TinyMCE 和 Bootstrap 时遇到了一些问题。我有一个页面，其中概述了一个 div，如果用户单击该 div，他们可以从模态中选择图像。该图像被插入到一个
CSS 格 :hover with transition only applying to one element?
出于某种原因，当我设置一个过渡时，当我的鼠标悬停在一个元素上时，背景会改变颜色，它只适用于一个元素，但它们都共享同一个类？任何帮助我的 CSS .outer_ad { position:rel
android-studio - 缺少 Android Studio 调试监 window 格
好吧，这真的很愚蠢。我不知道 Android Studio 中的调试监视框架发生了什么。我有 1.5.1 的工作室。是否有一些来自 intellij 的 secret 知识来展示它。最佳答案与以
css - 3 格 : one centered and the two others one in each side
我有这个标记: some code > 我正在尝试获取此布局: 注意:上一个和下一个按钮靠近#player 我正在尝试这样: .nextBtn{
CSS 格 :hover per menu item having Child and Sibling Selectors issue
网站:http://avuedesigns.com/index 首页有 6 个菜单项。我希望每件元素在您经过时都有自己的颜色。这是当您将鼠标悬停在 div 上时将所有内容更改为白色的行 li#hom
php - 在 Joomla 中，如何为 index.php 创建 3x3(9 格)代码？
我需要在 index.php 文件中显示它，但没有任何效果。我所有的文章都没有正确定位。我将其用作代码: 最佳答案您可以首先检查您

首页

博学

6Ren·AI

商城

r - 从文本单元格中提取围绕关键字的多个句子