gpt4 book ai didi

r - 缩短(限制)句子的长度

转载 作者:行者123 更新时间:2023-12-02 03:24:42 24 4
gpt4 key购买 nike

我有一列很长的名称,我想将它们的长度缩短为最大 40 个字符

示例数据:

x <- c("This is the longest sentence in world, so now just make it longer",
"No in fact, this is the longest sentence in entire world, world, world, world, the whole world")

我想将句子长度缩短到大约 40 (-/+ 3 nchar),这样我就不会在单词中间缩短句子。 (所以长度是根据单词之间的空白来决定的)。

此外,我想在缩短的句子后添加3个点

所需的输出将是这样的:

c("This is the longest sentence...","No in fact, this is the longest...")

这个函数只会盲目地缩短40个字符。:

strtrim(x, 40)

最佳答案

好的,我现在有了更好的解决方案:)

x <- c("This is the longest sentence in world, so now just make it longer","No in fact, this is the longest sentence in entire world, world, world, world, the whole world")

extract <- function(x){
result <- stri_extract_first_regex(x, "^.{0,40}( |$)")
longer <- stri_length(x) > 40
result[longer] <- stri_paste(result[longer], "...")
result
}
extract(x)
## [1] "This is the longest sentence in world, ..." "No in fact, this is the longest sentence ..."

新旧基准(32 000 个句子):

microbenchmark(sapply(x, cutAndAddDots, USE.NAMES = FALSE), extract(x), times=5)
Unit: milliseconds
expr min lq median uq max neval
sapply(x, cutAndAddDots, USE.NAMES = FALSE) 3762.51134 3762.92163 3767.87134 3776.03706 3788.139 5
extract(x) 56.01727 57.18771 58.50321 79.55759 97.924 5

旧版本

此解决方案需要 stringi 包,并且始终在字符串末尾添加三个点 ...

require(stringi)
sapply(x, function(x) stri_paste(stri_wrap(x, 40)[1],"..."),USE.NAMES = FALSE)
## [1] "This is the longest sentence in world..." "No in fact, this is the longest..."

此命令仅在长度超过 40 个字符的句子中添加三个点:

require(stringi)
cutAndAddDots <- function(x){
w <- stri_wrap(x, 40)
if(length(w) > 1){
stri_paste(w[1],"...")
}else{
w[1]
}
}
sapply(x, cutAndAddDots, USE.NAMES = FALSE)
## [1] "This is the longest sentence in world" "No in fact, this is the longest..."

性能说明stri_wrap 中设置 normalize=FALSE 可能会加快大约 3 倍的速度(在 30000 个句子上进行测试)

测试数据:

x <- stri_rand_lipsum(3000)
x <- unlist(stri_split_regex(x,"(?<=\\.) "))
head(x)
[1] "Lorem ipsum dolor sit amet, vel commodo in."
[2] "Ultricies mauris sapien lectus dignissim."
[3] "Id pellentesque semper turpis habitasse egestas rutrum ligula vulputate laoreet mollis id."
[4] "Curabitur volutpat efficitur parturient nibh sociosqu, faucibus tellus, eleifend pretium, quis."
[5] "Feugiat vel mollis ultricies ut auctor."
[6] "Massa neque auctor lacus ridiculus."
stri_length(head(x))
[1] 43 41 90 95 39 35

cutAndAddDots <- function(x){
w <- stri_wrap(x, 40, normalize = FALSE)
if(length(w) > 1){
stri_paste(w[1],"...")
}else{
w[1]
}
}
cutAndAddDotsNormalize <- function(x){
w <- stri_wrap(x, 40, normalize = TRUE)
if(length(w) > 1){
stri_paste(w[1],"...")
}else{
w[1]
}
}
require(microbenchmark)
microbenchmark(sapply(x, cutAndAddDots, USE.NAMES = FALSE),sapply(x, cutAndAddDotsNormalize, USE.NAMES = FALSE),times=3)
Unit: seconds
expr min lq median uq max
sapply(x, cutAndAddDots, USE.NAMES = FALSE) 3.917858 3.967411 4.016964 4.055571 4.094178
sapply(x, cutAndAddDotsNormalize, USE.NAMES = FALSE) 13.493732 13.651451 13.809170 13.917854 14.026538

关于r - 缩短(限制)句子的长度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27757436/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com