gpt4 book ai didi

r - 如何将连续变量重新编码为范围

转载 作者:行者123 更新时间:2023-12-02 03:19:23 25 4
gpt4 key购买 nike

我需要将连续变量重新编码为类别,通常我使用“cut”函数,但在 cut 函数中我需要指定中断。我正在寻找一种方法来根据数据框中的其他分类变量设置不同的中断集。

我示例中的变量是 Cost,“breaks”在第二个表“cost.range”中,我为每个“Region”和每个“Category”设置了不同的 Breaks

示例:

Region    Product     Category Cost
Country A Product 1 CAT A 731
Country B Product 1 CAT A 659
Country C Product 1 CAT A 385
Country D Product 1 CAT A 763
Country A Product 2 CAT A 701
Country B Product 2 CAT A 759
Country C Product 2 CAT A 580
Country D Product 2 CAT A 147
Country A Product 3 CAT B 645
Country B Product 3 CAT B 657
Country C Product 3 CAT B 424


Region Category Cost.Range Range
Country A CAT A 10 R1
Country A CAT A 50 R2
Country A CAT A 200 R3
Country A CAT A 1000 R4
Country A CAT B 20 R1
Country A CAT B 100 R2
Country A CAT B 400 R3
Country A CAT B 1500 R4

生成示例的代码:

Region <- c("Country A","Country B","Country C","Country D","Country A","Country B","Country C","Country D","Country A","Country B","Country C","Country D","Country A","Country B","Country C","Country D")
Product <- c("Product 1","Product 1","Product 1","Product 1","Product 2","Product 2","Product 2","Product 2","Product 3","Product 3","Product 3","Product 3","Product 4","Product 4","Product 4","Product 4")
Category <- c("CAT A","CAT A","CAT A","CAT A","CAT A","CAT A","CAT A","CAT A","CAT B","CAT B","CAT B","CAT B","CAT B","CAT B","CAT B","CAT B")
Cost <- c(731,659,385,763,701,759,580,147,645,657,424,34,850,463,160,550)

Table1 <- data.frame(Region, Product, Category, Cost)

Region <- c("Country A","Country A","Country A","Country A","Country A","Country A","Country A","Country A")
Category <- c("CAT A","CAT A","CAT A","CAT A","CAT B","CAT B","CAT B","CAT B")
Cost.range <- c(10,50,200,1000,20,100,400,1500)
Range <- c("R1","R1","R3","R4","R1","R2","R3","R4")

Table2 <- data.frame(Region, Category, Cost.range, Range)

最佳答案

这不是最优雅的解决方案(我有兴趣看到更好的方法)但它应该能达到您正在寻找的结果。

dplyr 包中的 select()distinct() 函数找到 Region 的可能组合和 类别。这些组合用于对两个表进行子集化,并将 cut() 函数应用于每个子集。

library('dplyr')
library('data.table')

dt1 <- data.table(Table1)
dt2 <- data.table(Table2)

t2d <- Table2 %>% select(Region, Category) %>% distinct

for(i in 1:nrow(t2d)){
dt2_range_subset <- dt2[Region == as.character(t2d$Region[i])
& Category == t2d$Category[i], Cost.range]
dt1[Region == as.character(t2d$Region[i]) & Category == t2d$Category[i],
Cost_factor := cut(as.matrix(Cost), dt2_range_subset)]
}

关于r - 如何将连续变量重新编码为范围,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34490027/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com