gpt4 book ai didi

r - 为什么 dplyr 删除了不满足条件的值?

转载 作者:行者123 更新时间:2023-12-02 10:38:37 27 4
gpt4 key购买 nike

我正在使用dplyr替换 valueNA如果满足条件,但它会输入 NA放在不该在的地方。

输出:

df <- structure(list(id = c("USC00231275", "USC00231275", "USC00231275", 
"USC00231275", "USC00231275", "USC00231275", "USC00231275", "USC00231275",
"USC00231275", "USC00231275"), element = c("TMAX", "TMIN", "TMAX",
"TMIN", "TMAX", "TMIN", "TMAX", "TMIN", "TMAX", "TMIN"), year = c(1937,
1937, 1937, 1937, 1937, 1937, 1937, 1937, 1937, 1937), month = c(5,
5, 5, 5, 5, 5, 5, 5, 5, 5), day = c(1, 1, 2, 2, 3, 3, 4, 4, 5,
5), date = structure(c(-11933, -11933, -11932, -11932, -11931,
-11931, -11930, -11930, -11929, -11929), class = "Date"), value = c(0,
53.96, 68, 44.96, 62.06, 53.96, 73.04, 53.96, 69.08, 50)), .Names = c("id",
"element", "year", "month", "day", "date", "value"), row.names = c(NA,
10L), class = "data.frame")

data.frame (注:仅第 1 行和第 2 行满足条件)

            id element year month day       date value
1 USC00231275 TMAX 1937 5 1 1937-05-01 0.00
2 USC00231275 TMIN 1937 5 1 1937-05-01 53.96
3 USC00231275 TMAX 1937 5 2 1937-05-02 68.00
4 USC00231275 TMIN 1937 5 2 1937-05-02 44.96
5 USC00231275 TMAX 1937 5 3 1937-05-03 62.06
6 USC00231275 TMIN 1937 5 3 1937-05-03 53.96
7 USC00231275 TMAX 1937 5 4 1937-05-04 73.04
8 USC00231275 TMIN 1937 5 4 1937-05-04 53.96
9 USC00231275 TMAX 1937 5 5 1937-05-05 69.08
10 USC00231275 TMIN 1937 5 5 1937-05-05 50.00

dplyr

df %>%
group_by(date) %>%
mutate(
value = if(value[element == 'TMIN'] >= value[element == 'TMAX'])
as.numeric(NA) else value
)

id element year month day date value
(chr) (chr) (dbl) (dbl) (dbl) (date) (dbl)
1 USC00231275 TMAX 1937 5 1 1937-05-01 NA
2 USC00231275 TMIN 1937 5 1 1937-05-01 NA
3 USC00231275 TMAX 1937 5 2 1937-05-02 68.00
4 USC00231275 TMIN 1937 5 2 1937-05-02 44.96
5 USC00231275 TMAX 1937 5 3 1937-05-03 NA
6 USC00231275 TMIN 1937 5 3 1937-05-03 NA
7 USC00231275 TMAX 1937 5 4 1937-05-04 73.04
8 USC00231275 TMIN 1937 5 4 1937-05-04 53.96
9 USC00231275 TMAX 1937 5 5 1937-05-05 69.08
10 USC00231275 TMIN 1937 5 5 1937-05-05 50.00

请注意,唯一应该更改的行是 12 ,但是dplyr更改行 56即使没有满足条件。

最佳答案

下面的代码应该可以完成您想要做的事情

df %>%
group_by(date) %>%
mutate(new_value = ifelse( ( (value[element == 'TMIN'] >= value[element == 'TMAX']) & element=='TMIN'), NA, value)) %>%
ungroup

对于这是否是一个bug的问题,我认为不是。仅查看一年的数据,其中 TMIN >= TMAX,您将得到以下结果

df %>%
filter(date == '1937-05-01') %>%
mutate(res = (value[element == 'TMIN'] >= value[element == 'TMAX'])) %>%
mutate(new_value = ifelse( (res & element=='TMIN'), NA, value))

id element year month day date value res new_value
1 USC00231275 TMAX 1937 5 1 1937-05-01 0.00 TRUE 0
2 USC00231275 TMIN 1937 5 1 1937-05-01 53.96 TRUE NA

构造 value[element == 'TMIN'] >= value[element == 'TMAX']) 将始终为 true,如 res 中所示> 栏目。下面的代码对此进行了一些分解,希望能够澄清(我希望)。

### Just looking at one date
> df2 <- df %>% filter(date == '1937-05-01')
> df2
id element year month day date value
1 USC00231275 TMAX 1937 5 1 1937-05-01 0.00
2 USC00231275 TMIN 1937 5 1 1937-05-01 53.96

### This comparison will be recycled for every element in the group,
### so it will always be TRUE or always FALSE.
> c(df2$value[df2$element == 'TMIN'], df2$value[df2$element == 'TMAX'])
[1] 53.96 0.00

由于对整个组进行一次比较,因此他们将始终看到 TRUE 或始终看到 FALSE。

给出正确结果的代码显示了如何进行比较。

一种可能的最终解决方案可能是:

df %>%
group_by(date) %>%
mutate(value = ifelse( ( (value[element == 'TMIN'] >= value[element == 'TMAX']) & element=='TMIN'), NA, value)) %>%
ungroup

关于r - 为什么 dplyr 删除了不满足条件的值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34485798/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com