r - 如何识别重复的单词以及句子中重复的位置和数量-6ren

r - 如何识别重复的单词以及句子中重复的位置和数量

转载作者：行者123 更新时间：2023-12-04 12:37:01

24

4

我有一个包含连续单词重复的句子的数据集:

数据:

df <- data.frame(
  Turn = c("oh is that that steak i got the other night",       # that that
           "no no no i 'm dave and you 're alan",               # no no no
           "yeah i mean the the film was quite long though",    # the the
           "it had steve martin in it it 's a comedy"))         # it it

目标:

我想要获得的是添加到此数据框中的另外三列:

df$rep_Word : 指定重复单词的列
df$rep_Pos : 一列指定单词在句子中重复的第一个位置
df$rep_Numb : 指定单词重复次数的列

所以预期的数据框如下所示:

预期结果:

df
                                            Turn rep_Word rep_Pos rep_Numb
1    oh is that that steak i got the other night     that       4        1
2            no no no i 'm dave and you 're alan       no       2        2
3 yeah i mean the the film was quite long though      the       5        1
4       it had steve martin in it it 's a comedy       it       7        1

迄今为止尝试的解决方案:

我的预感是，可以通过 strsplit 获取有关重复单词、位置和重复次数的信息。和函数 duplicated ，例如，因此:

df_split <- apply(df, 2, function(x) strsplit(x, "\\s"))

df_split
$Turn
$Turn[[1]]
 [1] "oh"    "is"    "that"  "that"  "steak" "i"     "got"   "the"   "other" "night"
$Turn[[2]]
 [1] "no"   "no"   "no"   "i"    "'m"   "dave" "and"  "you"  "'re"  "alan"
$Turn[[3]]
 [1] "yeah"   "i"      "mean"   "the"    "the"    "film"   "was"    "quite"  "long"   "though"
$Turn[[4]]
 [1] "it"     "had"    "steve"  "martin" "in"     "it"     "it"     "'s"     "a"      "comedy"

例如，对于 df 中的第一句话, duplicated显示哪个单词被重复(即 duplicated 评估为 TRUE 的单词)，并且重复的数量和位置也可以读取该信息:

duplicated(df_split$Turn[[1]])
 [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

问题是我不知道如何操作 duplicated以在 df 中获得所需的添加列的方式.非常感谢您对这项工作的帮助。

最佳答案

这是解决问题的另一种方法。

df <- data.frame(
  Turn = c("oh is that that steak i got the other night",  # that that
           "no no no i 'm dave and you 're alan",               # no no no
           "yeah i mean the the film was quite long though",    # the the
           "it had steve martin in it it 's a comedy",         # it it)
           "it had steve martin in in it it 's a comedy",
           "yeah i mean the film was quite long though", 
           "hi hi then other words and hi hi again",
           "no no no i 'm dave yes yes and you 're alan no no no no"))  # no no no and no no no no

library(data.table)
cols <- c("rep_Word", "rep_Pos", "rep_Numb")
setDT(df)[, (cols) := {
  words <- strsplit(as.character(Turn), " ")[[1]]
  idx <- rleid(words)
  check <- duplicated(idx)
  chg <- check - shift(check, fill = FALSE)
  starts <- which(chg == 1)
  aend <- if(sum(chg) == 0L) which(chg == -1) else c(which(chg == -1), length(chg) + 1L)
  freq <- aend - starts
  wrd <- words[starts]
  no_dup_default <- .(.(NA_character_), .(NA_integer_), .(NA_integer_))
  if(length(wrd)) .(.(wrd), .(starts), .(freq)) else no_dup_default
}, seq.int(nrow(df))]


df
#                                                       Turn   rep_Word  rep_Pos rep_Numb
# 1:             oh is that that steak i got the other night       that        4        1
# 2:                     no no no i 'm dave and you 're alan         no        2        2
# 3:          yeah i mean the the film was quite long though        the        5        1
# 4:                it had steve martin in it it 's a comedy         it        7        1
# 5:             it had steve martin in in it it 's a comedy      in,it      6,8      1,1
# 6:              yeah i mean the film was quite long though         NA       NA       NA
# 7:                  hi hi then other words and hi hi again      hi,hi      2,8      1,1
# 8: no no no i 'm dave yes yes and you 're alan no no no no  no,yes,no  2, 8,14    2,1,3
#                

# or
df[, lapply(.SD, unlist), seq.int(nrow(df))][, -1]
#                                                        Turn rep_Word rep_Pos rep_Numb
#  1:             oh is that that steak i got the other night     that       4        1
#  2:                     no no no i 'm dave and you 're alan       no       2        2
#  3:          yeah i mean the the film was quite long though      the       5        1
#  4:                it had steve martin in it it 's a comedy       it       7        1
#  5:             it had steve martin in in it it 's a comedy       in       6        1
#  6:             it had steve martin in in it it 's a comedy       it       8        1
#  7:              yeah i mean the film was quite long though     <NA>      NA       NA
#  8:                  hi hi then other words and hi hi again       hi       2        1
#  9:                  hi hi then other words and hi hi again       hi       8        1
# 10: no no no i 'm dave yes yes and you 're alan no no no no       no       2        2
# 11: no no no i 'm dave yes yes and you 're alan no no no no      yes       8        1
# 12: no no no i 'm dave yes yes and you 're alan no no no no       no      14        3

关于r - 如何识别重复的单词以及句子中重复的位置和数量，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/60463993/

24

4

0

文章推荐：带有 mingw 的 cuda - 更新

文章推荐： django - 保存表单数据会重写同一行

文章推荐： c - C中变量定义长度的数组如何存储在内存中？

java - 如何使用 LatLng 获取起始位置/位置。我确实有一个正在移动的当前位置/位置
我正在尝试获取从过去的 startposition/location 到当前移动的 currentposition/location 的距离(以米为单位)。我确实有工作正常的currentposit
javascript - 保存和加载 div 位置 - 缩放并保存在加载时改变 div 位置
所以我有一堆绝对覆盖的 div。用户通过在叠加层上拖动来创建方形 div。如果您要创建一个 div，然后放大和缩小，div 会保持在同一位置，因为它对叠加层是绝对的，如前所述。然而问题就出在这里。您
android - 在android中的显示器(屏幕)上查找 View 位置(位置)
我想找到 View 在显示屏幕上的位置。为此，我使用了 view.getLeft() 、view.getBottom() 、view.getRight() 等方法> , view.getTop()。
ios - UIScrollView - 需要 x 位置/宽度的约束，需要 y 位置/高度的约束
我有一个看起来像这样的 View 层次结构(基于其他答案和 Apple 的使用 UIScrollView 的高级 AutoLayout 指南): ScrollView 所需的2 个步骤是: 为 Scr
MySQL如何对一名学生的科目成绩进行排名/位置
所以我有一个名为 MARKS 的表，我有这些列 STUDENT_ID, CLASSFORM_NAME, ACADEMIC_YEAR, TERM, SUBJECT_NAME, TOTAL_MARKS
jQuery 位置
我有一个问题我无法理解，请帮助: 我开发了带有图像的 html 页面，并使用 jQuery UI 帮助使它们可拖动，我将这些图像位置设置为相对位置并给出了左侧和顶部像素，这是页面的链接 http://
IE11中的CSS动画背景-位置
我正在尝试创建一个 CSS 动画，它在 sprite 表中循环播放 16 个图像，给人一种幽灵“漂浮”的错觉。动画通过在 background-position 位置之间移动以显示不同状态的幽灵来实现
Flutter WebView 位置
我正在创建这个网站的 WebView https://nearxt.com/打开时询问位置但是当我使用此链接在 flutter 中创建 webview 时那么它就无法定位我还在应用程序中定义了位置，但
swift - NSWindow 位置
我正在以编程方式创建一个需要跨越 2 个屏幕的窗口。正在创建的窗口的大小是正确的，但窗口大约从第一个屏幕的一半开始。我可以将它拖回第一个屏幕的开头，NSWindow 非常适合。我只需要知道在窗口的起
javascript - 位置 "/"的匹配叶路由没有元素
位置“/”的匹配叶路由没有元素。这意味着默认情况下它将呈现一个空值，从而导致一个“空”页面 //App.js File import { BrowserRouter as Router, Routes
ubuntu - 向网络公开目录/位置
我有一个运行 Ubuntu 和 Apache 的 VPS 例如，假设地址是:5.5.5.5 在 VPS 上，我有一个名为 eggdrop 的用户(除了我的 root 用户)。用户 eggdrop 有
JLabel ImageIcon 位置
我有一个 JLabel与 ImageIcon ，我使用 setIcon() JLabel中的函数. ImageIcon然后上来，坐在我的JLabel 的文字左侧.是否有可能拥有 ImageIcon在文
Graphviz:xlabel 位置
我的图中有节点，它们的 xlabels 位于它们的左上方。我怎样才能改变这个位置？我希望 xlabels 正好位于节点本身的旁边。最佳答案 xlp是你想要的属性，但它没有做任何事情。你不能改变位置
VIM 自定义函数定义/位置
我对基本的 VIM 功能有疑问:(我尝试谷歌搜索但找不到答案) 如何列出所有自定义功能。(我做了 :function 并且不能找到我的自定义函数) 如何获得自定义函数列表中的函数(或它们的存储位置)。
Php 位置 ("some other page")
我是 PHP 的新手，虽然我一直在搜索，但我不知道该怎么做。我知道可以使用 Location("some page") 进行重定向。我还读到，只要没有向用户显示任何内容，它就可以工作。我想做的是:
jquery jgrowl 位置
如果在 jgrowl.css 中位置更改为“center”，我如何将其覆盖为默认值，即“top-right” $.jGrowl(data, { header: 'data', an
iphone - UISwipeGestureRecognizer 位置
我需要根据用户是否滑动屏幕顶部、屏幕中间或屏幕底部来触发不同的事件。我正在尝试找出最好/最简单的方法来做到这一点，因为我很确定没有办法从 UISwipeGestureRecognizer 获取位置。
delphi - 如何获取由delphi生成的EXE的VMT表的地址(位置)
我需要枚举用delphi编写的外部应用程序中使用的类，因此我需要访问VMT表以获取该信息，但是我找不到任何有关如何在exe（由delphi生成）文件中找到VMT（虚拟方法表）的位置（地址）的文档。
delphi - 不区分大小写位置
在 D2010 (unicode) 中是否有像 Pos 这样不区分大小写的类似函数？我知道我可以使用 Pos(AnsiUpperCase(FindString), AnsiUpperCase(Sou
位置:固定在reveal.js中
我正在尝试为我的reveal.js 演示文稿制作一个标题，该标题会粘贴在屏幕顶部。标题中的内容在每张幻灯片的基础上都是动态的，因此我必须将标记放在 section 标记中。显然，如果标记在 sect

首页

博学

6Ren·AI

商城

r - 如何识别重复的单词以及句子中重复的位置和数量