gpt4 book ai didi

R - 从单词列创建句子 ID

转载 作者:行者123 更新时间:2023-12-01 21:30:53 24 4
gpt4 key购买 nike

我有一个看起来像这样的单词数据框(tibble)。

   text      confidence type          start_time end_time
<chr> <chr> <chr> <chr> <chr>
1 Angela 0.7482 pronunciation 0.04 0.32
2 very 1.0 pronunciation 0.32 0.59
3 powerful 1.0 pronunciation 0.59 1.29
4 . 0.0 punctuation NA NA
5 And 1.0 pronunciation 1.3 1.65
6 with 1.0 pronunciation 1.65 1.87
7 every 1.0 pronunciation 1.88 2.24
8 hurricane 1.0 pronunciation 2.24 2.75
9 there's 0.8826 pronunciation 2.75 2.96
10 that 1.0 pronunciation 2.96 3.22
11 one's 0.6438 pronunciation 3.22 3.73
12 own 0.748 pronunciation 3.73 4.02
13 . 0.0 punctuation NA NA
14 It's 0.9278 pronunciation 4.02 4.19
15 usually 0.851 pronunciation 4.19 4.51

我正在尝试创建一个句子 ID 值,以便我可以将单词组合成句子。我希望 ID 以 type = punctuation 开始/结束。

   text      confidence type          start_time end_time sentence_id
<chr> <chr> <chr> <chr> <chr> <dbl>
1 Angela 0.7482 pronunciation 0.04 0.32 1
2 very 1.0 pronunciation 0.32 0.59 1
3 powerful 1.0 pronunciation 0.59 1.29 1
4 . 0.0 punctuation NA NA 1
5 And 1.0 pronunciation 1.3 1.65 2
6 with 1.0 pronunciation 1.65 1.87 2
7 every 1.0 pronunciation 1.88 2.24 2
8 hurricane 1.0 pronunciation 2.24 2.75 2
9 there's 0.8826 pronunciation 2.75 2.96 2
10 that 1.0 pronunciation 2.96 3.22 2
11 one's 0.6438 pronunciation 3.22 3.73 2
12 own 0.748 pronunciation 3.73 4.02 2
13 . 0.0 punctuation NA NA 2
14 It's 0.9278 pronunciation 4.02 4.19 3
15 usually 0.851 pronunciation 4.19 4.51 3

我确信有一种相对简单的方法可以做到这一点,但我不太明白。有没有人有什么建议?如果有帮助,这里是输出:

structure(list(text = c("Angela", "very", "powerful", ".", "And", 
"with", "every", "hurricane", "there's", "that", "one's", "own",
".", "It's", "usually"), confidence = c("0.7482", "1.0", "1.0",
"0.0", "1.0", "1.0", "1.0", "1.0", "0.8826", "1.0", "0.6438",
"0.748", "0.0", "0.9278", "0.851"), type = c("pronunciation",
"pronunciation", "pronunciation", "punctuation", "pronunciation",
"pronunciation", "pronunciation", "pronunciation", "pronunciation",
"pronunciation", "pronunciation", "pronunciation", "punctuation",
"pronunciation", "pronunciation"), start_time = c("0.04", "0.32",
"0.59", NA, "1.3", "1.65", "1.88", "2.24", "2.75", "2.96", "3.22",
"3.73", NA, "4.02", "4.19"), end_time = c("0.32", "0.59", "1.29",
NA, "1.65", "1.87", "2.24", "2.75", "2.96", "3.22", "3.73", "4.02",
NA, "4.19", "4.51")), row.names = c(NA, -15L), class = c("tbl_df",
"tbl", "data.frame"))

最佳答案

一个 dplyr 选项可以是:

df %>%
mutate(sentence_id = rev(cumsum(rev(type) == "punctuation")),
sentence_id = max(sentence_id) - sentence_id + 1)

text confidence type start_time end_time sentence_id
<chr> <chr> <chr> <chr> <chr> <dbl>
1 Angela 0.7482 pronunciation 0.04 0.32 1
2 very 1.0 pronunciation 0.32 0.59 1
3 powerful 1.0 pronunciation 0.59 1.29 1
4 . 0.0 punctuation <NA> <NA> 1
5 And 1.0 pronunciation 1.3 1.65 2
6 with 1.0 pronunciation 1.65 1.87 2
7 every 1.0 pronunciation 1.88 2.24 2
8 hurricane 1.0 pronunciation 2.24 2.75 2
9 there's 0.8826 pronunciation 2.75 2.96 2
10 that 1.0 pronunciation 2.96 3.22 2
11 one's 0.6438 pronunciation 3.22 3.73 2
12 own 0.748 pronunciation 3.73 4.02 2
13 . 0.0 punctuation <NA> <NA> 2
14 It's 0.9278 pronunciation 4.02 4.19 3
15 usually 0.851 pronunciation 4.19 4.51 3

关于R - 从单词列创建句子 ID,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62455991/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com