gpt4 book ai didi

r - 只保留第一个重复的行

转载 作者:行者123 更新时间:2023-12-04 12:20:47 25 4
gpt4 key购买 nike

在我的数据框中,如果字符串 Position 在第一行下方的行中多次出现,我只想保留 first 行。请看我的输出示例。我正在尝试 duplicated 函数,但我不确定如何保留第一行。

Time    Pos
2006-01-12 Position
2006-01-16 Position
2006-01-17 Position
2006-02-01
2006-02-01 Position
2006-02-02
2006-02-02 Position
2006-02-02 Position
2006-02-02 Position
2006-04-04 Position
2006-04-06 Position
2006-04-06 Position
2006-10-11
2006-10-17 Position
2006-10-18
2006-10-18 Position
2006-10-18
2006-10-18 Position
2006-10-18
2006-10-18 Position
2006-10-18 Position
2006-10-18 Position
2006-10-18 Position
2006-10-19 Position

输出:

Time    Pos
2006-01-12 Position
2006-02-01
2006-02-01 Position
2006-02-02
2006-02-02 Position
2006-10-11
2006-10-17 Position
2006-10-18
2006-10-18 Position
2006-10-18
2006-10-18 Position
2006-10-18
2006-10-18 Position

最佳答案

这是一个使用 dplyr + data.table::rleid 的解决方案:

library(dplyr)

df %>%
mutate(ID = data.table::rleid(df$Pos)) %>%
group_by(ID) %>%
slice(1) %>%
ungroup() %>%
select(-ID)

结果:

# A tibble: 13 x 2
Time Pos
<chr> <chr>
1 2006-01-12 Position
2 2006-02-01
3 2006-02-01 Position
4 2006-02-02
5 2006-02-02 Position
6 2006-10-11
7 2006-10-17 Position
8 2006-10-18
9 2006-10-18 Position
10 2006-10-18
11 2006-10-18 Position
12 2006-10-18
13 2006-10-18 Position

data.table 等效项:

setDT(df)[, .SD[1], by = rleid(Pos), .SDcol = c("Time", "Pos")]

结果:

    rleid       Time      Pos
1: 1 2006-01-12 Position
2: 2 2006-02-01
3: 3 2006-02-01 Position
4: 4 2006-02-02
5: 5 2006-02-02 Position
6: 6 2006-10-11
7: 7 2006-10-17 Position
8: 8 2006-10-18
9: 9 2006-10-18 Position
10: 10 2006-10-18
11: 11 2006-10-18 Position
12: 12 2006-10-18
13: 13 2006-10-18 Position

数据:

df = structure(list(Time = c("2006-01-12", "2006-01-16", "2006-01-17", 
"2006-02-01", "2006-02-01", "2006-02-02", "2006-02-02", "2006-02-02",
"2006-02-02", "2006-04-04", "2006-04-06", "2006-04-06", "2006-10-11",
"2006-10-17", "2006-10-18", "2006-10-18", "2006-10-18", "2006-10-18",
"2006-10-18", "2006-10-18", "2006-10-18", "2006-10-18", "2006-10-18",
"2006-10-19"), Pos = c("Position", "Position", "Position", "",
"Position", "", "Position", "Position", "Position", "Position",
"Position", "Position", "", "Position", "", "Position", "", "Position",
"", "Position", "Position", "Position", "Position", "Position"
)), .Names = c("Time", "Pos"), class = "data.frame", row.names = c(NA,
-24L))

关于r - 只保留第一个重复的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47394458/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com