gpt4 book ai didi

r - 在 R 中的坐标行中仅保留最大间隔

转载 作者:行者123 更新时间:2023-12-04 08:08:07 25 4
gpt4 key购买 nike

   Groups Name start end  sum
1 G1 A 451 954 1405
2 G1 B 451 951 1402
3 G1 C 451 969 1420
4 G1 D 463 870 1333
5 G1 E 463 888 1351
6 G1 X 230 450 680
7 G1 Z 229 450 681
8 G2 F 119 841 960
9 G2 G 118 842 960
10 G3 H 460 790 1250
11 G3 I 123 300 177
12 G4 J 343 878 1221
13 G4 K 343 878 1221
14 G4 L 320 862 1182
我希望每个组只保留一个区间代表(区间意味着 df$startdf$end 在行之间重叠,我解释说:
例如在 G1 2个间隔组 :
间隔 1 (与 min = 451max = 969 ):
Name start end sum
A 451 954 1405
B 451 951 1402
C 451 969 1420
D 463 870 1333
E 463 888 1351
然后我取最大的 df$sum (这里 1420)

间隔 2 (与 min = 229max = 450 )
Name start end  sum
X 230 450 680
Z 229 450 681
然后我取最大的 df$sum (这里 681)
如果我为所有 df 这样做,我会得到:
   Groups Name start end  sum
3 G1 C 451 969 1420
7 G1 Z 229 450 681
9 G2 G 118 842 960
10 G3 H 460 790 1250
11 G3 I 123 300 177
12 G4 J 343 878 1221
有人有想法吗?
以下是数据:
structure(list(Groups = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 3L, 3L, 4L, 4L, 4L), .Label = c("G1", "G2", "G3", "G4"
), class = "factor"), Name = structure(c(1L, 2L, 3L, 4L, 5L,
13L, 14L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), .Label = c("A", "B",
"C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "X", "Z"), class = "factor"),
start = c(451L, 451L, 451L, 463L, 463L, 230L, 229L, 119L,
118L, 460L, 123L, 343L, 343L, 320L), end = c(954L, 951L,
969L, 870L, 888L, 450L, 450L, 841L, 842L, 790L, 300L, 878L,
878L, 862L), sum = c(1405L, 1402L, 1420L, 1333L, 1351L, 680L,
681L, 960L, 960L, 1250L, 177L, 1221L, 1221L, 1182L)), class = "data.frame", row.names = c(NA,
-14L))

最佳答案

这是一个 data.table 方法,利用间隔...

library( intervals )
library( data.table )
setDT( mydata )
#factors are annoying, set to character
mydata[, Groups := as.character( Groups )]
mydata[, Name := as.character( Name )]
#find the intervals by group
ans <- mydata[, as.data.table(
intervals::interval_union(
intervals::Intervals( as.matrix( .SD ) ) ,
check_valid = TRUE )) ,
by = .(Groups),
.SDcols = c("start", "end") ]
#set the names right
setnames( ans, old = c("V1", "V2"), new = c("start", "end") )
#create temporary IDs
ans[, id := .I ]
#set a key to perform rowwise operation by EACHI without the formation of groups
setkey(ans, id)
#get max sum and Name by interval (if multiple rows have the same max sum, pick the first)
ans[ans, c("Name", "Sum") := {
val = mydata[ Groups == i.Groups & start >= i.start & end <= i.end, ]
list( val[ first( val[, .I[sum == max(sum)] ] ), Name ], max(val$sum) )
}, by = .EACHI ][,id := NULL][]
输出
#    Groups start end Name  Sum
# 1: G1 229 450 Z 681
# 2: G1 451 969 C 1420
# 3: G2 118 842 F 960
# 4: G3 123 300 I 177
# 5: G3 460 790 H 1250
# 6: G4 320 878 J 1221

关于r - 在 R 中的坐标行中仅保留最大间隔,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66120075/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com