gpt4 book ai didi

R:将数据分配给他们的百分位数

转载 作者:行者123 更新时间:2023-12-05 09:29:42 25 4
gpt4 key购买 nike

我正在使用 R 编程语言。假设,我有以下数据框:

var_1 = rnorm(100,10,10)
var_2 = rnorm(100,10,10)
var_3 = rnorm(100,10,10)

d = data.frame(var_1, var_2, var_3)

head(d)


var_1 var_2 var_3
1 14.251923 14.877801 22.636207
2 7.325137 8.513718 21.021522
3 3.400001 -3.400397 11.274797
4 16.400597 8.623980 9.366115
5 7.065583 13.155570 17.891432
6 21.297912 4.341385 -11.337330

我的问题:对于每个变量中的每个元素,我想用它所属的百分位数(例如第 5、10、15 等)替换该元素。

例如:

a = quantile(d$var_1, c(0.05, 0.10, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1))
b = quantile(d$var_2, c(0.05, 0.10, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1))
c = quantile(d$var_3, c(0.05, 0.10, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1))

new = data.frame(a,b,c)

a b c
5% -0.8806901 -7.40560488 -4.7353920
10% 0.3595086 -3.77910527 -0.6874766
15% 1.1201300 -2.91946322 0.9584040
20% 3.0581928 0.05127097 2.1457693
25% 5.0901641 1.91719913 4.6997966
30% 7.0056228 2.56215345 6.2691894
35% 7.6089831 3.58688942 7.1900823
40% 8.9853805 5.00957881 7.8488446
45% 9.9264540 5.73653135 8.6135093
50% 10.2235212 7.43425669 9.6063344
55% 11.5707533 8.54160196 10.9239040
60% 13.2422940 9.65006232 11.7036647
65% 15.1076889 11.07081528 13.2440004
70% 16.5354881 12.38804922 15.2585324
75% 17.9336020 13.16121940 17.6656208
80% 19.5312682 15.31472178 18.4820207
85% 21.9264905 17.99689941 19.3347983
90% 24.4511364 20.47478783 22.0647173
95% 26.6820271 25.27082341 24.4473033
100% 41.4419744 39.75848302 34.5105183

现在,每当一个变量在每个百分位数范围之间时,我想进行以下替换:

  • 如果d$var_1 < -0.8806901 , 然后 d$var_1 == as.factor("5th percentile")
  • 如果d$var_1 > -0.8806901 d$var_1 < 0.3595086 , 然后 d$var_1 == as.factor("10th percentile")

...

  • 如果d$var_1 > 15.1076889 d$var_1 < 16.5354881 , 然后 d$var_1 == as.factor("65th percentile")

等等

  • 如果d$var_2 < -7.40560488 , 然后 d$var_2 == as.factor("5th percentile")

等等

  • 如果d$var_3 < -4.7353920 , 然后 d$var_3 == as.factor("5th percentile")

等等

有人可以告诉我怎么做吗?

最佳答案

这可能是你想要的

apply(d, 2, function(x) paste0( ntile(x, n = 20L) / 20 * 100, "th percentile" ))

输出

       var_1              var_2              var_3             
[1,] "60th percentile" "100th percentile" "25th percentile"
[2,] "80th percentile" "60th percentile" "100th percentile"
[3,] "45th percentile" "90th percentile" "75th percentile"
[4,] "70th percentile" "85th percentile" "35th percentile"
[5,] "30th percentile" "5th percentile" "55th percentile"
...

补全

library(data.table)
cols = c("var_1", "var_3")
setDT(d)[, (cols) := lapply(.SD, function(x) paste0( ntile(x, n = 20L) / 20 * 100, "th percentile")), .SDcols = cols]

关于R:将数据分配给他们的百分位数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70503039/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com