gpt4 book ai didi

r - 总结一个矩阵。获取每 100000 个单位类别的平均值

转载 作者:行者123 更新时间:2023-12-04 12:21:17 25 4
gpt4 key购买 nike

我有以下数据结构。

pos <- c(4532568,4541529,4586529,4591235,4712360,4732504,4740231,10532655,10542365,10564587,45312567,45326354,45369874,124832658,124845829,124869874)
cm <- c(2.21,2.25,2.26,2.29,3.31,3.35,3.36,4.32,4.35,4.39,5.23,5.27,5.29,7.36,7.45,7.49)
data <- cbind(pos,cm)

pos cm
[1,] 4532568 2.21
[2,] 4541529 2.25
[3,] 4586529 2.26
[4,] 4591235 2.29
[5,] 4712360 3.31
[6,] 4732504 3.35
[7,] 4740231 3.36
[8,] 10532655 4.32
[9,] 10542365 4.35
[10,] 10564587 4.39
[11,] 45312567 5.23
[12,] 45326354 5.27
[13,] 45369874 5.29
[14,] 124832658 7.36
[15,] 124845829 7.45
[16,] 124869874 7.49

我的目的是总结“pos”列中按 100000 个单位分组的行,并获得每个类的“CM”列的平均值。
此示例中的结果如下所示:
pos <- c(4500000,4700000,10500000,45300000,124800000)
cm <- c(2.2525,3.34,4.35333,5.26333,7.43333)
newdata <- cbind(pos,cm)

pos cm
[1,] 4500000 2.25250
[2,] 4700000 3.34000
[3,] 10500000 4.35333
[4,] 45300000 5.26333
[5,] 124800000 7.43333

我不知道如何自动化处理庞大的数据框的过程。

对阿克伦的回答:
所以。如果我在真实数据集中使用以下脚本:
 Ch1<- ch1 %>%
as.data.frame %>%
group_by(Pos = plyr::round_any(Pos, 1e5, f = floor))

然后我得到以下结果(仅前 10 行)
 structure(list(Chr = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = "1", class = "factor"), Pos = c(0, 0, 0,
2e+05, 5e+05, 5e+05, 5e+05, 5e+05, 5e+05, 7e+05), CM = c(0, 0.080572,
0.092229, 0.439456, 1.478148, 1.478214, 1.480558, 1.488889, 1.489481,
1.931794)), .Names = c("Chr", "Pos", "CM"), row.names = c(NA,
-10L), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), vars = "Pos", drop = TRUE, indices = list(
0:2, 3L, 4:8, 9L), group_sizes = c(3L, 1L, 5L, 1L), biggest_group_size = 5L, labels = structure(list(
Pos = c(0, 2e+05, 5e+05, 7e+05)), row.names = c(NA, -4L), class = "data.frame", vars = "Pos", drop = TRUE, .Names = "Pos"))

但是,如果我使用整个脚本来获取 Ch1$CM 的平均值:
 Ch1<- ch1 %>%
as.data.frame %>%
group_by(Pos = plyr::round_any(Pos, 1e5, f = floor)) %>%
summarise(cm = mean(cm))

然后我得到以下data.frame:
 structure(list(Pos = c(0, 2e+05, 5e+05, 7e+05, 8e+05, 9e+05, 
1e+06, 1100000, 1200000, 1300000), cm = c(4.528498, 4.528498,
4.528498, 4.528498, 4.528498, 4.528498, 4.528498, 4.528498, 4.528498,
4.528498)), .Names = c("Pos", "cm"), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))

如您所见,平均值是错误的,因为它们都是相等的。我不知道为什么会这样。

最佳答案

我们可以使用 round_any

library(dplyr)
data %>%
as.data.frame %>%
group_by(grp = plyr::round_any(pos, 1e5, f = floor)) %>%
summarise(cm = mean(cm))
# A tibble: 5 x 2
# grp cm
# <dbl> <dbl>
#1 4500000 2.252500
#2 4700000 3.340000
#3 10500000 4.353333
#4 45300000 5.263333
#5 124800000 7.433333

关于r - 总结一个矩阵。获取每 100000 个单位类别的平均值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47978322/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com