gpt4 book ai didi

r - 在 R 中,使用 dplyr 的 mutate() 创建一个新变量,条件是另一个变量的内容

转载 作者:行者123 更新时间:2023-12-04 11:24:59 59 4
gpt4 key购买 nike

我想搜索一个变量 placement 的内容,并根据寻找的模式创建一个新变量 term。一个最小的例子......

首先我创建一个搜索模式函数:

calcterm <- function(x){    # calcterm takes a column argument to read
print(x)
if (x %in% '_fa_') {
return ('fall')
} else if (x %in% '_wi_') {
return('winter')
} else if (x %in% '_sp_') {
return('spring')
} else {return('summer')
}
}

我将创建一个小数据框,然后将其传递给 dplyr 的 tbl_df:

placement <- c('pn_ds_ms_fa_th_hrs','pn_ds_ms_wi_th_hrs' ,'pn_ds_ms_wi_th_hrs')
hours <- c(1230, NA, 34)

d <- data.frame(placement, hours)

library(dplyr)

d <- tbl_df(d)

表 d 现在显示为:

>d
Source: local data frame [3 x 2]

placement hours
(fctr) (dbl)
1 pn_ds_ms_fa_th_hrs 1230
2 pn_ds_ms_wi_th_hrs NA
3 pn_ds_ms_wi_th_hrs 34

接下来,我使用 mutate 来实现我的功能。目标是读取 placement 的内容,并创建一个新变量,该变量将产生 fallwinterspringsummer 取决于 placement 列中的模式。

d %>% mutate(term=calcterm(placement))

输出给我留下了

[1] pn_ds_ms_fa_th_hrs pn_ds_ms_wi_th_hrs pn_ds_ms_wi_th_hrs
Levels: pn_ds_ms_fa_th_hrs pn_ds_ms_wi_th_hrs
Source: local data frame [3 x 3]

placement hours term
(fctr) (dbl) (chr)
1 pn_ds_ms_fa_th_hrs 1230 summer
2 pn_ds_ms_wi_th_hrs NA summer
3 pn_ds_ms_wi_th_hrs 34 summer

Warning messages:
1: In if (x %in% "_fa_") { :
the condition has length > 1 and only the first element will be used
2: In if (x %in% "_wi_") { :
the condition has length > 1 and only the first element will be used
3: In if (x %in% "_sp_") { :
the condition has length > 1 and only the first element will be used

所以,很明显我一开始就写错了……也许 %in% 可以换成 grep 模式?我不确定如何处理。

谢谢。

更新

根据下面的回复,我正在用我的全系列管道更新它,以展示我是如何实现它的。我正在使用的数据是“宽的”,我开始只是翻转它的轴,并从 colnames 中提取有用的信息。此示例有效 --- 但在我自己的数据中,当我进入 mutate() 步骤时,我收到消息:Error: invalid subscript type 'list'

值得注意的是,在 summarise() 之后我收到警告:

Warning message:
attributes are not identical across measure variables; they will be dropped

也许这与下一步失败有关?由于警告没有出现在我的示例中?

set.seed(1) 

dfmaker <- function() {
setNames(
data.frame(
replicate(5, sample(c(NA, 300:500), 4, TRUE), FALSE)),
c('pn_ds_ms_fa_th_hrs','rn_ds_ms_wi_th_stu' ,'adn_ds_ms_wi_th_hrs','pn_ds_ms_wi_th_hrs' ,'rn_bsn_ds_ms_wi_th_hrs'))
}


d <- dfmaker()

library(dplyr)

d <- tbl_df(d)

grepl_vec_pattern = Vectorize(grepl, 'pattern')

calcterm = function(s) {
require(pryr)
s = as.character(s)
grepped_patterns = grepl_vec_pattern(s, pattern = c('_sp', '_su', '_fa', '_wi'))
stopifnot(any(rowSums(grepped_patterns) == 1)) # Ensure that there is exactly one match
reduce_to_colname_with_true = apply(grepped_patterns, 1, compose(names, which))
lut_table = c('_sp' = 'spring', '_su' = 'summer', '_fa' = 'fall', '_wi' = 'winter')
lut_table[reduce_to_colname_with_true]
}

select(d, matches("^pn_|^adn_|^bsn_"), -starts_with("rn_bsn")) %>% # all the pn, adn, bsn programs, for all information
select(contains("_hrs") ) %>% # takes out just the hours
gather(placement, hours) %>% # flip it!
group_by(placement) %>% # gather all the schools into a single observation (replicated placement values at this point)
summarise(sumHours = sum(hours, na.rm=T)) %>%
mutate(term = calcterm(placement))

最佳答案

一种简单且非常有效的方法是创建一个简单的查找/模式向量,然后将(非常有效的)stringi::stri_detect_fixeddata.table 结合起来。即使对于庞大的数据集,该解决方案也应该可以很好地扩展

library(stringi)
library(data.table)
Lookup <- c("fall", "winter", "spring")
Patterns <- c("fa", "wi", "sp")
setDT(d)[, term := Lookup[stri_detect_fixed(placement, Patterns)], by = placement]
d[is.na(term), term := "summer"]
d
# placement hours term
# 1: pn_ds_ms_fa_th_hrs 1230 fall
# 2: pn_ds_ms_wi_th_hrs NA winter
# 3: pn_ds_ms_wi_th_hrs 34 winter

如果我们坚持使用 dplyr,我们将需要创建一个辅助函数来处理未找到匹配项的情况(data.table 会自动处理)

f <- function(x, Lookup, Patterns) {
temp <- Lookup[stri_detect_fixed(x[1L], Patterns)]
if(!length(temp)) return("summer")
temp
}

d %>%
group_by(placement) %>%
mutate(term = f(placement, Lookup, Patterns))

# Source: local data frame [3 x 3]
# Groups: placement [2]
#
# placement hours term
# (fctr) (dbl) (chr)
# 1 pn_ds_ms_fa_th_hrs 1230 fall
# 2 pn_ds_ms_wi_th_hrs NA winter
# 3 pn_ds_ms_wi_th_hrs 34 winter

关于r - 在 R 中,使用 dplyr 的 mutate() 创建一个新变量,条件是另一个变量的内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35547092/

59 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com