gpt4 book ai didi

r - 按非结构化文本列分组以透视数据

转载 作者:行者123 更新时间:2023-12-02 07:57:47 24 4
gpt4 key购买 nike

我有一些文本数据,如下所示:

   ID                      text
1
2
3
4 HD some text
5 LP some more text
6 AN even more text
7
8
9
10 HD some different text
11 SN some more different text
12 AN even more different text

每个部分是一个文档,由一些空格分隔。文档以 ID 作为 HD 开始,以 IDAN 结束。我正在尝试做两件事,最终我想 pivot_wider 数据并将列作为 ID 然后每一行都是一个文档。我运行以下命令:

widerText <- textData %>% 
pivot_wider(names_from = ID, values_from = text)


finalText <- widerText %>%
unnest(HD, LP, AN, SN, PP, LO, AN)

它没有保留正确的结构,文本会混淆。因此,我想在运行 pivot_wider 之前创建一个分组变量。

每个文档都以 HD 开头并以 AN 结尾,因此我想创建具有以下输出的内容:

预期输出:

   ID                      text   grp
1 0
2 0
3 0
4 HD some text 1
5 LP some more text 1
6 AN even more text 1
7 0
8 0
9 0
10 HD some different text 2
11 SN some more different text 2
12 AN even more different text 2

数据:

textData <- data.frame(
ID = c(
" ", " ", " ", "HD", "LP", "AN",
" ", " ", " ", "HD", "SN", "AN",
" ", " ", " ", "HD", "PP", "AN",
" ", " ", " ", "HD", "LO", "AN"

),
text = c(
" ", " ", " ", "some text", "some more text", "even more text",
" ", " ", " ", "some different text", "some more different text", "even more different text",
" ", " ", " ", "some additional text", "some more additional text", "even more additional text",
" ", " ", " ", "some extra text", "some more extra text", "even more extra text"
)
)

最佳答案

您可以在 textData$ID == "HD" 上使用 cumsum 获取组,并将空组设置为 0 使用ifelse.

textData$grp <- ifelse(textData$ID==" ", 0, cumsum(textData$ID == "HD"))
textData
# ID text grp
#1 0
#2 0
#3 0
#4 HD some text 1
#5 LP some more text 1
#6 AN even more text 1
#7 0
#8 0
#9 0
#10 HD some different text 2
#11 SN some more different text 2
#12 AN even more different text 2
#13 0
#14 0
#15 0
#16 HD some additional text 3
#17 PP some more additional text 3
#18 AN even more additional text 3
#19 0
#20 0
#21 0
#22 HD some extra text 4
#23 LO some more extra text 4
#24 AN even more extra text 4

关于r - 按非结构化文本列分组以透视数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61365018/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com