作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
base-R 函数 all.vars()
返回表达式中的所有名称。例如:
> all.vars( ~ e == M * c^2 )
[1] "e" "M" "c"
有一个 R 运算符不合适。在许多不使用非标准评估和 rlang 等函数的人编写的表达式中,名称将是变量的名称。但是如果这些表达式包含对
$
的调用,右侧的名称将不是变量,而是索引或列名称。 (我知道巧妙地使用环境和数据屏蔽会模糊变量和列名之间的区别,但这不是这里的重点。)
all.vars()
没有忽略
$
右侧的选项.有没有类似的功能,或者我必须编写自己的表达式步行器?基本上,我想要一个函数,如果通过表达式
a $ b + c $ d
将返回 "a"和 "c"。
cond <- ¢ A logical vector of 500,000 elements ¢
v1 ¢ (and v2 etc. ) ¢ <- ¢ Numerical or string vectors of the same length ¢
result_size <- length( cond )
result <- rep( NA, result_size )
result[ cond ] <- f( v1[ cond ], v3[ cond ]
, v4[ cond ], v7[ cond ]
, v9[ cond ], v10[ cond ]
)
result[ ! cond ] <- g( v2[ ! cond ], v3[ ! cond ]
, v4[ ! cond ], v5[ ! cond ]
, v6[ ! cond ], v8[ ! cond ]
, v10[ ! cond ]
)
我认为,这就是 R 专家所说的拆分工作流。按条件拆分数据,分别处理每组,合并结果。
cond <- ¢ A logical vector of 500,000 elements ¢
v1 ¢ (and v2 etc. ) ¢ <- ¢ Numerical or string vectors of the same length ¢
result <- splivif( cond
, f( v1, v3, v4, v7, v9, v10 )
, g( v2, v3, v4, v5, v6, v8, v10 )
)
splivif()
这里应该解释为另一种条件函数,就像
if()
一样。 ,
if_else()
,
ifelse()
和
fifelse()
,可能还有六种我还不知道的其他人。它将以与这些函数相同的方式隐藏一些巧妙的细节:即评估其条件,根据该条件将“then”和“else”分支中提到的任何变量拆分为子向量,调用这些子向量上的每个分支,然后结合结果。
cond
'th 个元素,并将它们在新环境中绑定(bind)到原始变量的名称。
E
其中名称“v1”绑定(bind)到
V1[ cond ]
,其中
V1
是原始值
v1
.也在
E
是名称“v2”到“v10”的类似绑定(bind)。
splivif()
然后在
E
中计算它的“then”和“else”表达式,并将结果组合成一个结果向量。
all.vars()
正在扫描变量的“then”和“else”表达式。我使用嵌套命名列表来存储控制我的计算的值。因此,表达式对这些列表的元素有大量引用,例如
Taxogellation $ IgnoreRepeats
和
Taxogellation $ DoInnerSplines
.我的问题的核心是
all.vars()
然后会错误地返回“IgnoreRepeats”和“DoInnerSplines”作为变量的名称,而实际上它们是索引。
pension()
,计算每个家庭的养老金。这与任何现有政府都会提供的不同,但说明了典型养老金计算的特征。例如,结果通常取决于年龄和健康状况,并且可能取决于收入。这些相关性为任何此类计算设置了最小复杂度,因此也设置了最小时间。
pension()
给所有 50,000 个家庭。它们是: 通过 Tidyverse 分组;通过 data.table 分组;和矢量化。后者使用的事实是运算符和函数,例如
+
,
|
,
>
和
pmax()
可以应用于多个元素的向量,然后对相应元素进行元素处理。
library( tidyverse )
library( data.table )
library( assertthat )
library( microbenchmark )
library( purrr )
#1) Create sample data
#=====================
# The code in this section makes a table
# of no_of_groups families. Each family has
# one or two adults. Adults are randomly assigned
# an income, between 0 and 30,000 pounds;
# an age, between 18 and 99, and a health
# indicator. Each family also gets an integer
# ID. Each adult also gets a number indicating
# whether they are adult 1 or adult 2.
#
# The sections following this will define
# a function for calculating families'
# pensions. My code will apply it in three
# ways, and time each one. These are: by
# grouping using the Tidyverse; by grouping
# using data.table; and by vectorisation.
# This shows that the Tidyverse and data.table
# are both woefully inefficient compared with
# vectorisation. For 5,000 families, the
# Tidyverse takes 2.5 seconds and data.table
# 2 seconds. Vectorisation takes a mere 50
# milliseconds, 40 times as fast.
#
no_of_groups <- 5000
group_sizes <- sample( c(1,2), no_of_groups, replace=TRUE )
ids <- 1:no_of_groups
data <- tibble( fam_id=map2( ids, group_sizes, rep ) %>% unlist() )
data <-
data %>%
group_by( fam_id ) %>%
mutate( ad_no = seq_along( fam_id )
, two_people = length( ad_no ) == 2
) %>%
ungroup()
data $ income <- runif( nrow( data ), 0, 1 ) * 30000
data $ age <- sample( 18:99, nrow( data ), replace=TRUE )
data $ bad_health <- sample( c(T,F), nrow( data ), replace=TRUE, prob=c(0.1,0.9) )
#2) Function to calculate pension on single family
#=================================================
# two_people is true if the family has two
# people, otherwise false.
# ad1_inc and ad2_inc are the incomes, in
# pounds per year. ad2_inc is NA if there is
# only one person.
# Similarly, ad1_age and ad2_age are ages.
# And ad1_bad_health and ad2_bad_health are
# Booleans indicating whether the person
# has bad health.
# The result is the pension the Government
# gives the family, in pounds per week.
# This is NOT meant to be the same as in any
# existing country's social-security system,
# but exemplifies the kinds of calculation
# such a function needs to do. On our data,
# these will be called several hundred
# thousand times.
#
pension <- function( two_people
, ad1_inc, ad2_inc
, ad1_age, ad2_age
, ad1_bad_health, ad2_bad_health
)
{
max_age <-
ifelse( two_people
, pmax( ad1_age, ad2_age )
, ad1_age
)
income <-
ifelse( two_people
, ad1_inc + ad2_inc
, ad1_inc
)
bad_health <-
ifelse( two_people
, ad1_bad_health | ad2_bad_health
, ad1_bad_health
)
pension_level <-
case_when( income > 50000 | max_age < 65 ~ "None"
, max_age > 80 | bad_health ~ "High"
, max_age >= 65 ~ "Normal"
)
pension <-
case_when( pension_level == "High" ~ 200.00
, pension_level == "Normal" ~ 150.00
, pension_level == "None" ~ 0
)
pension
}
#3) Check it works
#=================
pension( F, 40000, NA, 75, NA, F, NA )
# 150.
pension( T, 20000, 20000, 75, 75, F, F )
# 150.
pension( F, 60000, NA, 75, NA, F, NA )
# 0, because of high income.
pension( T, 30000, 30000, 75, 75, F, F )
# 0, because of high income.
pension( F, 60000, NA, 50, NA, F, NA )
# 0, because of low age.
pension( T, 20000, 20000, 75, 75, F, T )
# 200, because of bad health.
#4) Function to calculate all pensions using Tidyverse group-by
#==============================================================
pension_over_all_TV <- function( data )
{
results <-
data %>%
group_by( fam_id ) %>%
group_map( ~ {
assert_that( nrow( .x ) %in% c( 1, 2 ) )
two_people <- .x $ two_people[[ 1 ]]
pension( two_people
, .x $ income[[ 1 ]]
, ifelse( two_people, .x $ income[[ 2 ]], NA )
, .x $ age [[ 1 ]]
, ifelse( two_people, .x $ age[[ 2 ]], NA )
, .x $ bad_health[[ 1 ]]
, ifelse( two_people, .x $ bad_health[[ 2 ]], NA )
)
}
)
#
# A vector of pension values, one per family.
results
}
#5) Try it and time it
#=====================
pensions_TV <- pension_over_all_TV( data )
#
# Pensions as calculated by Tidyverse grouping.
res <- microbenchmark( pension_over_all_TV( data ), times=3 )
print( res )
#
# Time it. Mean is 2.5 seconds:
# Unit: seconds
# expr min lq mean median uq max neval
# pension_over_all_TV(data) 2.533073 2.565714 2.584183 2.598356 2.609738 2.621121 3
#6) Function to calculate all pensions using data.table group-by
#===============================================================
pension_over_all_DT <- function( data )
{
# The function that data.table must apply
# to each group.
#
f <- function( group )
{
assert_that( nrow( group ) %in% c( 1, 2 ) )
two_people <- group $ two_people[[ 1 ]]
pension( two_people
, group $ income[[ 1 ]]
, ifelse( two_people, group $ income[[ 2 ]], NA )
, group $ age [[ 1 ]]
, ifelse( two_people, group $ age[[ 2 ]], NA )
, group $ bad_health[[ 1 ]]
, ifelse( two_people, group $ bad_health[[ 2 ]], NA )
)
}
data <- as.data.table( data )
results <-
data[
, f( .SD )
, by=c( "fam_id" )
]
#
# A table with a V1 column containing one
# pension value per family.
results
}
#7) Try it and time it
#=====================
pensions_DT <- pension_over_all_DT( data )
#
# Pensions as calculated by data.table grouping.
assert_that( are_equal( unlist( pensions_TV ), pensions_DT $ V1 ) )
#
# Making allowance for the slightly different
# formats of the results returned by group_map()
# and data.table's grouped operations, check
# that the numbers are the same.
res <- microbenchmark( pension_over_all_DT( data ), times=3 )
print( res )
#
# Time it. Mean is 2 seconds:
# Unit: seconds
# expr min lq mean median uq max neval
# pension_over_all_DT(data) 1.824391 1.950273 2.155805 2.076154 2.321512 2.56687 3
#8) Function to calculate all pensions using vectorisation
#=========================================================
# This applies pension() to data by using vectorisation.
# It widens data into a table wherein each column is
# a vector corresponding to one of pension()'s arguments.
# It then calls exec() to apply pension() to these
# vectors. I had deliberately written pension() so that
# it would work on vector arguments with more than one
# element.
#
pension_over_all_Vect <- function( data )
{
data_widened <-
pivot_wider( data
, names_from = "ad_no"
, names_prefix = "ad"
, values_from = all_of( c("income","age","bad_health") )
) %>%
rename( ad1_inc="income_ad1", ad2_inc="income_ad2",
, ad1_age="age_ad1", ad2_age="age_ad2"
, ad1_bad_health="bad_health_ad1", ad2_bad_health="bad_health_ad2"
) %>%
select( -fam_id )
#
# A table with one row for each family, and one
# column for each of pension()'s arguments.
results <- exec( pension, !!! as.list( data_widened ) )
#
# A vector of results: one pension value for
# each family.
results
}
#9) Try it and time it
#=====================
pensions_Vect <- pension_over_all_Vect( data )
#
# Returns a list of plausible-looking results.
assert_that( are_equal( unlist( pensions_TV ), pensions_Vect ) )
assert_that( are_equal( unlist( pensions_DT $ V1 ), pensions_Vect ) )
#
# Check that this is equal to the previously-
# calculated results.
res <- microbenchmark( pension_over_all_Vect( data ), times=3 )
print( res )
#
# Time it. The mean is 50 milliseconds.
# Unit: milliseconds
# expr min lq mean median uq max neval
# pension_over_all_Vect(data) 35.7834 45.23245 50.8431 54.6815 58.37295 62.0644 3
最佳答案
你应该解释为什么你需要这个,以及其他形式的非标准评估会发生什么。您的实际问题可能有更好的解决方案。
我会迅速更换 $
与 [[
:
replace_dollar <- function(expr) {
if (!is.language(expr) || length(expr) == 1L) return(expr)
if (expr[[1]] == quote(`$`)) {
expr[[1]] <- quote(`[[`)
expr[[3]] <- as.character(expr[[3]])
} else {
for (i in seq_along(expr)[-1])
expr[[i]] <- replace_dollar(expr[[i]])
}
expr
}
expr <- quote(a $ b + c $ d)
replace_dollar(expr)
# a[["b"]] + c[["d"]]
all.vars(replace_dollar(expr))
#[1] "a" "c"
请注意,根据文档:
x$name
is equivalent tox[["name", exact = FALSE]]
.
all.vars
.
关于r - 有没有办法阻止 all.vars() 从 $ 的右侧返回名称?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63580260/
我是一名优秀的程序员,十分优秀!