gpt4 book ai didi

在熔化时保留变量级别的名称

转载 作者:行者123 更新时间:2023-12-04 10:27:46 29 4
gpt4 key购买 nike

这个问题在这里已经有了答案:





Melt using patterns when variable names contain string information - avoid coercion to numeric

(1 个回答)


3年前关闭。




有没有办法保留正在融化的变量的原始级别的名称?例如,在下面的例子中,有没有办法得到“alpha”、“beta”和“gamma”而不是“1”、“2”、“3”。

当然,我可以重命名它们,但我正在使用的数据集有大量级别,因此重命名它们将非常耗时且容易出错。

谢谢。

library(data.table)
#> Warning: package 'data.table' was built under R version 3.4.2
set.seed(2334)

# define the dataframe
df <-
as.data.frame(
cbind(
a_alpha = rnorm(10),
a_beta = rnorm(10),
a_gamma = rnorm(10),
b_alpha = rnorm(10),
b_beta = rnorm(10),
b_gamma = rnorm(10),
id = c(1:10)
)
)

# check the structure of the wide format data
str(df)
#> 'data.frame': 10 obs. of 7 variables:
#> $ a_alpha: num -0.118 1.237 0.809 -0.766 -0.592 ...
#> $ a_beta : num 0.0019 1.0639 2.336 0.9056 0.6449 ...
#> $ a_gamma: num 0.5485 0.8345 -0.5977 0.0827 0.2754 ...
#> $ b_alpha: num 0.209 -0.305 0.434 -0.362 0.412 ...
#> $ b_beta : num -1.6404 2.8382 0.0661 0.7249 -0.4421 ...
#> $ b_gamma: num -0.144 0.964 -0.763 -1.356 0.995 ...
#> $ id : num 1 2 3 4 5 6 7 8 9 10

# convert to long format
df_long <- data.table::melt(
data.table::setDT(df),
measure = patterns("^a_", "^b_"),
value.name = c("a", "b"),
variable.name = "item"
)

# check the structure of the long format data
str(df_long)
#> Classes 'data.table' and 'data.frame': 30 obs. of 4 variables:
#> $ id : num 1 2 3 4 5 6 7 8 9 10 ...
#> $ item: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
#> $ a : num -0.118 1.237 0.809 -0.766 -0.592 ...
#> $ b : num 0.209 -0.305 0.434 -0.362 0.412 ...
#> - attr(*, ".internal.selfref")=<externalptr>

# structure of item
levels(df_long$item)
#> [1] "1" "2" "3"

# Question: instead of "1" "2" "3", how to get the "item" factor levels to be: "alpha" "beta" "gamma"

创建于 2018-01-12 由 reprexpackage (v0.1.1.9000)。

最佳答案

我过去处理这个问题的方法是使用 factormelt数据。但是,您可能必须进行一些检查以确保数据和级别的顺序正确。

下面是一个例子:

set.seed(2334)
df <- data.table(a_alpha = rnorm(10), a_beta = rnorm(10), a_gamma = rnorm(10),
b_alpha = rnorm(10), b_beta = rnorm(10), b_gamma = rnorm(10),
id = c(1:10))
df_mess <- copy(df)
setcolorder(df_mess, c(1, 7, 6, 4, 2, 5, 3))
names(df_mess)
# [1] "b_alpha" "id" "a_alpha" "a_beta" "b_beta" "b_gamma" "a_gamma"

stubs <- c("^a_", "^b_")
## assumes all stubs have same number of cols. Easy to modify
labs <- grep(stubs[1], names(df_mess), value = TRUE)
labs <- gsub(paste(stubs, collapse = "|"), "", labs[order(labs)])

out1 <- melt(df, measure.vars = patterns(stubs), value.name = c("a", "b"),
variable.name = "item")[
, item := factor(item, labels = labs)][]

out2a <- melt(df_mess, measure.vars = patterns(stubs), value.name = c("a", "b"),
variable.name = "item")[
, item := factor(item, labels = labs)][]

out2b <- melt(setcolorder(df_mess, names(df_mess)[order(names(df_mess))]),
measure.vars = patterns(stubs), value.name = c("a", "b"),
variable.name = "item")[
, item := factor(item, labels = labs)][]

library(compare)
compare(out1, out2a)
# FALSE [TRUE, TRUE, TRUE, FALSE]
compare(out1, out2b)
# TRUE

我没有做足够的测试用例来自信地说是否使用 ordernameslevs对所有情况都足够了,但到目前为止,我还没有发现数据不平衡时的异常(exception)情况。

关于在熔化时保留变量级别的名称,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48236460/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com