gpt4 book ai didi

r - 根据列的值是否在列的顶部 X 中选择行

转载 作者:行者123 更新时间:2023-12-04 10:29:26 24 4
gpt4 key购买 nike

我有一个数据框,我想根据某个列中的值是否在该行中所有列的前 5 个值中对其进行子集化。
这是我的数据框的简化版本:

> my.df <- data.frame(a = rnorm(10,5), b= rnorm(10,5), c=rnorm(10,5), d=rnorm(10,5), e=rnorm(10,5))
> my.df
a b c d e
1 6.401462 5.318849 5.373496 5.101140 3.710973
2 6.715845 4.786936 3.521965 4.264029 4.525138
3 6.076211 5.356114 5.605134 5.443002 5.296778
4 7.009623 5.275595 4.801874 4.355892 6.752737
5 5.002059 6.163398 6.063694 2.409702 6.172111
6 6.298305 3.291884 5.737053 4.701320 4.752406
7 4.856246 4.674743 5.550828 7.501786 5.466611
8 5.037990 4.129333 4.797334 5.143915 5.558161
9 4.903592 3.135622 5.879798 5.639893 4.368915
10 5.500374 4.400130 3.980433 6.203259 4.498614

现在我只想要列 a 或列 b 的值在其行中的前 2 个值中的行。所以在这个例子中删除第 7-9 行,给出:
          a        b        c        d        e
1 6.401462 5.318849 5.373496 5.101140 3.710973
2 6.715845 4.786936 3.521965 4.264029 4.525138
3 6.076211 5.356114 5.605134 5.443002 5.296778
4 7.009623 5.275595 4.801874 4.355892 6.752737
5 5.002059 6.163398 6.063694 2.409702 6.172111
6 6.298305 3.291884 5.737053 4.701320 4.752406
10 5.500374 4.400130 3.980433 6.203259 4.498614

有任何想法吗?

最佳答案

我们可以使用 apply 遍历行(来自 base R)检查是否 any 'a' 或 'b' 中的元素是 %in% sort ed group 创建逻辑索引并根据该索引对行进行子集

i1 <- apply(my.df, 1, function(x) any(x[1:2] %in% sort(x, decreasing = TRUE)[1:2]))
my.df[i1,]
# a b c d e
#1 6.401462 5.318849 5.373496 5.101140 3.710973
#2 6.715845 4.786936 3.521965 4.264029 4.525138
#3 6.076211 5.356114 5.605134 5.443002 5.296778
#4 7.009623 5.275595 4.801874 4.355892 6.752737
#5 5.002059 6.163398 6.063694 2.409702 6.172111
#6 6.298305 3.291884 5.737053 4.701320 4.752406
#10 5.500374 4.400130 3.980433 6.203259 4.498614

或使用 max.col来自 base R创建逻辑索引,这会更快,避免任何转换
i1 <- max.col(my.df, "first")
i2 <- max.col(replace(my.df, cbind(seq_len(nrow(my.df)), i1), -Inf), "first")
my.df[(i1 %in% 1:2) | (i2 %in% 1:2), ]

数据
my.df <- structure(list(a = c(6.401462, 6.715845, 6.076211, 7.009623, 
5.002059, 6.298305, 4.856246, 5.03799, 4.903592, 5.500374), b = c(5.318849,
4.786936, 5.356114, 5.275595, 6.163398, 3.291884, 4.674743, 4.129333,
3.135622, 4.40013), c = c(5.373496, 3.521965, 5.605134, 4.801874,
6.063694, 5.737053, 5.550828, 4.797334, 5.879798, 3.980433),
d = c(5.10114, 4.264029, 5.443002, 4.355892, 2.409702, 4.70132,
7.501786, 5.143915, 5.639893, 6.203259), e = c(3.710973,
4.525138, 5.296778, 6.752737, 6.172111, 4.752406, 5.466611,
5.558161, 4.368915, 4.498614)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"))

关于r - 根据列的值是否在列的顶部 X 中选择行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55104099/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com