gpt4 book ai didi

r - Data.table:引用每个组中的设置值,将函数应用到组上。将结果列传递给函数

转载 作者:行者123 更新时间:2023-12-01 15:39:37 25 4
gpt4 key购买 nike

我有一个长格式的数据,将按地理位置分组。我想计算每个组中一个感兴趣的变量与所有其他感兴趣的变量之间的差异。我无法弄清楚如何在单个数据表语句中有效地执行此操作,因此采用了一种变通方法,但在此过程中也引入了一些新错误(我使用更多变通方法修复了这些错误,但在这里提供帮助也将不胜感激!)。

然后我想将结果列传递给 ggplot 函数,但是无法使用推荐的方法,所以我使用了一个已弃用的方法。

library(data.table)
library(ggplot2)

set.seed(1)
results <- data.table(geography = rep(1:4, each = 4),
variable = rep(c("alpha", "bravo", "charlie", "delta"), 4),
statistic = rnorm(16) )

> results[c(1:4,13:16)]
geography variable statistic
1: 1 alpha -0.62645381
2: 1 bravo 0.18364332
3: 1 charlie -0.83562861
4: 1 delta 1.59528080
5: 4 alpha -0.62124058
6: 4 bravo -2.21469989
7: 4 charlie 1.12493092
8: 4 delta -0.04493361

base_variable <- "alpha"

从这一点出发,理想情况下我想编写一段简单的代码,按地理位置分组,然后以相同的格式返回此表,但每个组中每个变量的统计数据为(base_variable - 变量)。

我不知道如何做到这一点,所以我的解决方法如下,如果有任何关于更好方法的建议,我们将不胜感激。

# Convert to a wide table so we can do the subtraction by rows
results_wide <- dcast(results, geography ~ variable, value.var = "statistic")

geography alpha bravo charlie delta
1: 1 -0.6264538 0.1836433 -0.8356286 1.59528080
2: 2 0.3295078 -0.8204684 0.4874291 0.73832471
3: 3 0.5757814 -0.3053884 1.5117812 0.38984324
4: 4 -0.6212406 -2.2146999 1.1249309 -0.04493361

this_is_a_hack <- as.data.table(lapply(results_wide[,-1], function(x) results_wide[, ..base_variable] - x))

alpha.alpha bravo.alpha charlie.alpha delta.alpha
1: 0 -0.8100971 0.2091748 -2.2217346
2: 0 1.1499762 -0.1579213 -0.4088169
3: 0 0.8811697 -0.9359998 0.1859381
4: 0 1.5934593 -1.7461715 -0.5763070

名称现在乱七八糟,我们也没有地理信息。为什么名字是这样的?另外,需要重新添加地理位置。

this_is_a_hack[, geography := results_wide[, geography] ]

normalise_these_names <- colnames(this_is_a_hack)
#Regex approach. Hacky and situational.
new_names <- sub("\\.(.*)", "", normalise_these_names[normalise_these_names != "geography"] )
normalise_these_names[normalise_these_names != "geography"] <- new_names
#Makes use of the fact that geographies will appear last in the data.table, not generalisable approach.
colnames(this_is_a_hack) <- normalise_these_names

我不再需要基本变量,因为所有值都为零,所以我尝试删除它,但我似乎无法按照通常的方式执行此操作:

this_is_a_hack[, ..base_variable := NULL] 
Warning message:
In `[.data.table`(this_is_a_hack, , `:=`(..base_variable, NULL)) :
Column '..base_variable' does not exist to remove

library(dplyr)
this_is_a_hack <- select(this_is_a_hack, -base_variable)

final_result <- melt(this_is_a_hack, id.vars = "geography")

> final_result[c(1:4,9:12)]
geography variable value
1: 1 bravo -0.8100971
2: 2 bravo 1.1499762
3: 3 bravo 0.8811697
4: 4 bravo 1.5934593
5: 1 delta -2.2217346
6: 2 delta -0.4088169
7: 3 delta 0.1859381
8: 4 delta -0.5763070

数据现在可以可视化了。我正在尝试将这些变量传递到绘图函数中,但是与数据帧相比,引用 data.table 列似乎很困难。显然,您应该使用 quosures 将 data.table 变量传递给函数,但这只是出错了,所以我改用已弃用的“aes_string”函数 - 对此的帮助也很感激。

plott <- function(dataset, varx, vary, fillby) {
# varx <- ensym(varx)
# vary <- ensym(vary)
# vary <- ensym(fillby)
ggplot(dataset,
aes_string(x = varx, y = vary, color = fillby)) +
geom_point()
}

plott(dataset = final_result,
varx = "geography",
vary = "value",
fillby = "variable")

# Error I get when I try the ensym(...) method in the function:
Don't know how to automatically pick scale for object of type name. Defaulting to continuous. (this message happens 3 times)
Error: Aesthetics must be valid data columns. Problematic aesthetic(s): x = varx, y = vary, colour = fillby.
Did you mistype the name of a data column or forget to add stat()?

最佳答案

一个选项是通过创建一个基于“变量”的逻辑条件来对“统计”进行子集化,其中“base_variable”元素按“地理”分组

results[, .(variable, diff = statistic - statistic[variable == base_variable]), 
by = geography][variable != base_variable]
# geography variable diff
# 1: 1 bravo 0.8100971
# 2: 1 charlie -0.2091748
# 3: 1 delta 2.2217346
# 4: 2 bravo -1.1499762
# 5: 2 charlie 0.1579213
# 6: 2 delta 0.4088169
# 7: 3 bravo -0.8811697
# 8: 3 charlie 0.9359998
# 9: 3 delta -0.1859381
#10: 4 bravo -1.5934593
#11: 4 charlie 1.7461715
#12: 4 delta 0.5763070

关于r - Data.table:引用每个组中的设置值,将函数应用到组上。将结果列传递给函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57794449/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com