gpt4 book ai didi

r - 使用 na.rm = TRUE 汇总数据

转载 作者:行者123 更新时间:2023-12-04 09:18:56 24 4
gpt4 key购买 nike

考虑以下使用 dplyr 汇总数据框的示例。的 summarise管道识别min imum DATE与一些 CHAR 相关联:

library('tidyverse')
library('lubridate')

temp <- data.frame(
CHAR = c(
'A',
'B',
'C'
),
DATE = c(
'20090101',
'20100101',
NA
) %>% ymd(), # Turn character strings to dates
stringsAsFactors = FALSE
) %>% group_by(
CHAR
) %>% summarise(
DATE = min(DATE, na.rm = TRUE) # Extract minimum date
) %>% ungroup()

识别是否 min imum 是 NA是否使用 is.na 进行测试:
temp %>% mutate(
DATE_lgl = DATE %>% is.na() # Identify dates that are missing/NA
)

输出
# A tibble: 3 x 3
CHAR DATE DATE_lgl
<chr> <date> <lgl>
1 A 2009-01-01 FALSE
2 B 2010-01-01 FALSE
3 C NA FALSE

错误地有 DATE_lgl显示 FALSE哪里 DATENA .这是为什么?

删除 na.rm = TRUE修复了该问题,但不适用于以下配置,其中 na.rm = TRUE需要消除丢失的条目:
temp <- data.frame(
CHAR = c(
'A',
'B',
'C',
'C'
),
DATE = c(
'20090101',
'20100101',
NA,
'20110101'
) %>% ymd(), # Turn character strings to dates
stringsAsFactors = FALSE
) %>% group_by(
CHAR
) %>% summarise(
DATE = min(DATE, na.rm = TRUE) # Extract minimum date
) %>% ungroup()
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C LC_TIME=English_Canada.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] bindrcpp_0.2.2 lubridate_1.7.4 forcats_0.3.0 stringr_1.3.1 dplyr_0.7.5 purrr_0.2.5
[7] readr_1.1.1 tidyr_0.8.1 tibble_1.4.2 ggplot2_2.2.1 tidyverse_1.2.1

loaded via a namespace (and not attached):
[1] Rcpp_0.12.17 cellranger_1.1.0 pillar_1.2.3 compiler_3.5.0 plyr_1.8.4 bindr_0.1.1
[7] tools_3.5.0 jsonlite_1.5 nlme_3.1-137 gtable_0.2.0 lattice_0.20-35 pkgconfig_2.0.1
[13] rlang_0.2.1 psych_1.8.4 cli_1.0.0 rstudioapi_0.7 yaml_2.1.19 parallel_3.5.0
[19] haven_1.1.1 xml2_1.2.0 httr_1.3.1 hms_0.4.2 grid_3.5.0 tidyselect_0.2.4
[25] glue_1.2.0 R6_2.2.2 readxl_1.1.0 foreign_0.8-70 modelr_0.1.2 reshape2_1.4.3
[31] magrittr_1.5 scales_0.5.0 rvest_0.3.2 assertthat_0.2.0 mnormt_1.5-5 colorspace_1.3-2
[37] utf8_1.1.4 stringi_1.1.7 lazyeval_0.2.1 munsell_0.4.3 broom_0.4.4 crayon_1.3.4

最佳答案

问题是你正在评估

min(NA, na.rm=TRUE)
# Inf

对于第 3 行,这导致它是
dput(temp$DATE[3])
# structure(Inf, class = "Date")

添加 is.finite给您的 mutate
temp %>% 
mutate(DATE_lgl = is.finite(DATE) | is.na(DATE) # Identify dates that are missing/NA)

# A tibble: 3 x 3
# CHAR DATE DATE_lgl
# <chr> <date> <lgl>
# 1 A 2009-01-01 TRUE
# 2 B 2010-01-01 TRUE
# 3 C NA FALSE

打印 NA可能是 Date 类的打印限制
as.Date(Inf, origin="1970-01-01")
# NA
dput(as.Date(Inf, origin="1970-01-01"))
# structure(Inf, class = "Date")

关于r - 使用 na.rm = TRUE 汇总数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50766089/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com