gpt4 book ai didi

r - R中数字数据帧的有序列名数据帧的更快方法

转载 作者:行者123 更新时间:2023-12-01 13:40:40 25 4
gpt4 key购买 nike

我有一个包含三列的数据框:

set.seed(123)
df <- data.frame(x = abs(rnorm(10)), y = abs(rnorm(10)), z = abs(rnorm(10)))
df
x y z
1 0.56047565 1.2240818 1.0678237
2 0.23017749 0.3598138 0.2179749
3 1.55870831 0.4007715 1.0260044
4 0.07050839 0.1106827 0.7288912
5 0.12928774 0.5558411 0.6250393
6 1.71506499 1.7869131 1.6866933
7 0.46091621 0.4978505 0.8377870
8 1.26506123 1.9666172 0.1533731
9 0.68685285 0.7013559 1.1381369
10 0.44566197 0.4727914 1.2538149

我想构建一个具有相同行数的数据框,在每一行中都有 df 的列名,并按 df 中相应的行值排序。我有一个基于 for-loop 的方法可行,但对于大型数据帧来说太慢了,但我正在寻找一种更快的矢量化方法。这是基于 for 循环的方法:

df_names <- df
df_names[,] <- NA
df_names
x y z
1 NA NA NA
2 NA NA NA
3 NA NA NA
4 NA NA NA
5 NA NA NA
6 NA NA NA
7 NA NA NA
8 NA NA NA
9 NA NA NA
10 NA NA NA
for(r in 1:nrow(df)) {
sorted_row <- sort(df[r,], decreasing = TRUE)
df_names[r,] <- colnames(sorted_row)
}
df_names
x y z
1 y z x
2 y x z
3 x z y
4 z y x
5 z y x
6 y x z
7 z y x
8 y x z
9 z y x
10 z y x

如何使用应用系列或矢量化更快地完成此操作?

最佳答案

修订:我合并了所有尝试,@rawr 的更正,@rawr 的方法是迄今为止最好的 - 节省了 30 倍。 @989 添加了一种更快的方法。请参阅@989 接受的答案。

library(microbenchmark)
set.seed(123)
df <- data.frame(x = abs(rnorm(1000)), y = abs(rnorm(1000)), z = abs(rnorm(1000)))

get_name_df_with_for = function(df) {
df_names <- df
df_names[,] <- NA
for(r in 1:nrow(df)) {
df_names[r,] <- colnames(sort(df[r,], decreasing = TRUE))
}
return(df_names)
}

get_name_df_with_apply = function(df) {
df_names <- data.frame(t(apply(df, 1, function(row) names(sort(row, decreasing = TRUE)))))
return(df_names)
}

get_name_df_with_apply_names = function(df) {
df_names <- data.frame(t(apply(df, 1, function(row) names(row)[(order(row, decreasing = TRUE))])))
return(df_names)
}


get_name_df_double_t = function(df) {
df_names <- data.frame(t(apply(t(df), 2, function(col) names(sort(col, decreasing = TRUE)))))
return(df_names)
}

microbenchmark(
"for" = get_name_df_with_for(df),
"double_transpose" = get_name_df_double_t(df),
"apply" = get_name_df_with_apply(df),
"apply_with_names" = get_name_df_with_apply_names(df),
times = 10
)
Unit: milliseconds
expr min lq mean median uq max neval
for 417.08341 424.37019 446.00655 451.67451 459.64900 480.33351 10
double_transpose 28.46577 29.96637 32.44685 33.02763 33.51309 36.77468 10
apply 27.54800 28.27331 38.02239 30.36667 37.29727 71.46596 10
apply_with_names 12.35264 12.59502 14.16868 13.92946 15.80656 17.22005 10

关于r - R中数字数据帧的有序列名数据帧的更快方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40565152/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com