gpt4 book ai didi

r - 如何在R中将字符串拆分为规则间隔?

转载 作者:行者123 更新时间:2023-12-03 15:27:18 26 4
gpt4 key购买 nike

我有一个很长的字符串,我想将它分成规则的间隔,比如每个 10 个单词:

x <- "Hrothgar, king of the Danes, or Scyldings, builds a great mead-hall, or palace, in which he hopes to feast his liegemen and to give them presents. The joy of king and retainers is, however, of short duration. Grendel, the monster, is seized with hateful jealousy. He cannot brook the sounds of joyance that reach him down in his fen-dwelling near the hall. Oft and anon he goes to the joyous building, bent on direful mischief. Thane after thane is ruthlessly carried off and devoured, while no one is found strong enough and bold enough to cope with the monster. For twelve years he persecutes Hrothgar and his vassals."
使用 strsplit我可以将句子拆分为单个单词:
x1 <- unlist(strsplit(x, " "))
使用 paste我可以每个粘贴 10 个单词:
paste(x1[1:10], collapse = " ")
paste(x1[11:20], collapse = " ")
...
paste(x1[101:110], collapse = " ")
但这很乏味,所以我尝试了 sapplyseq :
lapply(x1, function(x) paste(x[seq(1,100,10)], collapse = " "))
但结果不是我想要的。我想要的是这样的:
[1] "Hrothgar, king of the Danes, or Scyldings, builds a great"
[2] "mead-hall, or palace, in which he hopes to feast his"
[3] "liegemen and to give them presents. The joy of king"
[4] "and retainers is, however, of short duration. Grendel, the monster,"
[5] "is seized with hateful jealousy. He cannot brook the sounds"
...
[10] "twelve years he persecutes Hrothgar and his vassals. NA NA"
我愿意接受任何解决方案,但特别感谢 base R一。

最佳答案

另一个只有 base R 的选项, 使用 regex捕获( \\1 )组 10 个单词(字母数字字符,可能包含连字符,带有单词绑定(bind) \b )和标点符号,并在最后放置一个“显着”字符串( "XXX" 这里),所以之后它可以被这个字符串分割(在 strsplit 模式中在这个字符串之前放置一个空格可以避免在每个位的末尾出现尾随空格):

unlist(strsplit(gsub("(((\\w|-)+\\b[ ,.]*){10})", "\\1XXX", x), " XXX"))

# [1] "Hrothgar, king of the Danes, or Scyldings, builds a great"
# [2] "mead-hall, or palace, in which he hopes to feast his"
# [3] "liegemen and to give them presents. The joy of king"
# [4] "and retainers is, however, of short duration. Grendel, the monster,"
# [5] "is seized with hateful jealousy. He cannot brook the sounds"
# [6] "of joyance that reach him down in his fen-dwelling near"
# [7] "the hall. Oft and anon he goes to the joyous"
# [8] "building, bent on direful mischief. Thane after thane is ruthlessly"
# [9] "carried off and devoured, while no one is found strong"
#[10] "enough and bold enough to cope with the monster. For"
#[11] "twelve years he persecutes Hrothgar and his vassals."

关于r - 如何在R中将字符串拆分为规则间隔?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64022743/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com