gpt4 book ai didi

r - 如何将 单元格重新编码为 tibble 列表列中的嵌套 NA ()?

转载 作者:行者123 更新时间:2023-12-03 22:54:35 27 4
gpt4 key购买 nike

在带有列表列的小标题中,我如何替换 <NULL>嵌套条目 NA (这将采用 <lgl [1]> 的嵌套形式)?

library(tibble)

tbl_with_null <-
tibble(letter = letters[1:10],
value_1 = list(1, 2, 4, data.frame(a = 1, 2, 3), NULL, 6, 7, c(8, 11, 25), NULL, 10),
value_2 = list("A", "B", "C", "D", NULL, NULL, NULL, list("H", "B", list(data.frame(id = 1:3))), "I", "J"))

> tbl_with_null

## # A tibble: 10 x 3
## letter value_1 value_2
## <chr> <list> <list>
## 1 a <dbl [1]> <chr [1]>
## 2 b <dbl [1]> <chr [1]>
## 3 c <dbl [1]> <chr [1]>
## 4 d <df[,3] [1 x 3]> <chr [1]>
## 5 e <NULL> <NULL>
## 6 f <dbl [1]> <NULL>
## 7 g <dbl [1]> <NULL>
## 8 h <dbl [3]> <list [3]>
## 9 i <NULL> <chr [1]>
## 10 j <dbl [1]> <chr [1]>
有没有办法对整个 tbl_with_null采取行动替换 <NULL>NA要得到:
## # A tibble: 10 x 3
## letter value_1 value_2
## <chr> <list> <list>
## 1 a <dbl [1]> <chr [1]>
## 2 b <dbl [1]> <chr [1]>
## 3 c <dbl [1]> <chr [1]>
## 4 d <df[,3] [1 x 3]> <chr [1]>
## 5 e <lgl [1]> <- NA <lgl [1]> # <- NA
## 6 f <dbl [1]> <lgl [1]> # <- NA
## 7 g <dbl [1]> <lgl [1]> # <- NA
## 8 h <dbl [3]> <list [3]>
## 9 i <lgl [1]> <- NA <chr [1]>
## 10 j <dbl [1]> <chr [1]>

更新

我在 this solution的基础上取得了一些进展:
tbl_with_null %>%
mutate(across(c(value_1, value_2), ~replace(., !lengths(.), list(NA))))

## # A tibble: 10 x 3
## letter value_1 value_2
## <chr> <list> <list>
## 1 a <dbl [1]> <chr [1]>
## 2 b <dbl [1]> <chr [1]>
## 3 c <dbl [1]> <chr [1]>
## 4 d <df[,3] [1 x 3]> <chr [1]>
## 5 e <lgl [1]> <lgl [1]>
## 6 f <dbl [1]> <lgl [1]>
## 7 g <dbl [1]> <lgl [1]>
## 8 h <dbl [3]> <list [3]>
## 9 i <lgl [1]> <chr [1]>
## 10 j <dbl [1]> <chr [1]>
然而 ,这是不够的,因为我正在寻找一种解决方案, 盲目替换 NULLNA整个数据帧。如果我们使用 mutate(across(everything(), ~replace(., !lengths(.), list(NA))))我们知道 letters列也变成了列表列,这是无意的。
## # A tibble: 10 x 3
## letter value_1 value_2
## <list> <list> <list>
## 1 <chr [1]> <dbl [1]> <chr [1]>
## 2 <chr [1]> <dbl [1]> <chr [1]>
## 3 <chr [1]> <dbl [1]> <chr [1]>
## 4 <chr [1]> <df[,3] [1 x 3]> <chr [1]>
## 5 <chr [1]> <lgl [1]> <lgl [1]>
## 6 <chr [1]> <dbl [1]> <lgl [1]>
## 7 <chr [1]> <dbl [1]> <lgl [1]>
## 8 <chr [1]> <dbl [3]> <list [3]>
## 9 <chr [1]> <lgl [1]> <chr [1]>
## 10 <chr [1]> <dbl [1]> <chr [1]>

更新 2

我以为我已经完成了
mutate(across(everything(), ~simplify(replace(., !lengths(.), list(NA)))))
但不幸的是,这在某些情况下会失败,例如以下数据:
tbl_with_no_null <-
tbl_with_null %>%
slice(8) %>%
select(letter, value_1)

## # A tibble: 1 x 2
## letter value_1
## <chr> <list>
## 1 h <dbl [3]>
虽然我期待
tbl_with_no_null %>%
mutate(across(everything(), ~simplify(replace(., !lengths(.), list(NA)))))
将返回相同的 tbl_with_no_null (因为没有 <NULL> 来替换):
## # A tibble: 1 x 2
## letter value_1
## <chr> <list>
## 1 h <dbl [3]>
但是我得到了错误:
Error: Problem with `mutate()` input `..1`.
x Input `..1` can't be recycled to size 1.
i Input `..1` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.
i Input `..1` must be size 1, not 3.
底线
我正在寻找一种方法来替换 <NULL>NA在列表列中,当然,如果没有 <NULL>替换,然后按原样返回输入。

最佳答案

base::rapply不通过 NULL 递归,但你可以使用 rrapply这允许这样做,并且非常有效:

library(rrapply)
rrapply::rrapply(tbl_with_null, function(x) NA, how = "replace", condition = is.null)

# A tibble: 10 x 3
letter value_1 value_2
<chr> <list> <list>
1 a <dbl [1]> <chr [1]>
2 b <dbl [1]> <chr [1]>
3 c <dbl [1]> <chr [1]>
4 d <df[,3] [1 x 3]> <chr [1]>
5 e <lgl [1]> <lgl [1]>
6 f <dbl [1]> <lgl [1]>
7 g <dbl [1]> <lgl [1]>
8 h <dbl [3]> <list [3]>
9 i <lgl [1]> <chr [1]>
10 j <dbl [1]> <chr [1]>
或者按照@JorisC 的建议。在评论中,使用 class在大型列表上似乎快 25% 的参数:
rrapply(tbl_with_null, classes = "NULL", how = "replace", f = function(x) NA)
只是为了好玩:
eval(parse(text=gsub("NULL","NA",capture.output(dput(tbl_with_null)))))

# A tibble: 10 x 3
letter value_1 value_2
<chr> <list> <list>
1 a <dbl [1]> <chr [1]>
2 b <dbl [1]> <chr [1]>
3 c <dbl [1]> <chr [1]>
4 d <df[,3] [1 x 3]> <chr [1]>
5 e <lgl [1]> <lgl [1]>
6 f <dbl [1]> <lgl [1]>
7 g <dbl [1]> <lgl [1]>
8 h <dbl [3]> <list [3]>
9 i <lgl [1]> <chr [1]>
10 j <dbl [1]> <chr [1]>

fortunes::fortune(106)

# If the answer is parse() you should usually rethink the question.
# -- Thomas Lumley
# R-help (February 2005)
速度比较出人意料,早有预料 parse成为最慢的解决方案:
microbenchmark::microbenchmark(
rrapply = rrapply::rrapply(tbl_with_null, function(x) NA, how = "replace", condition = is.null),
parse = eval(parse(text=gsub("NULL","NA",capture.output(dput(tbl_with_null))))),
dplyr = mutate(tbl_with_null,across(where(is.list), .fns = map_if, .p = is.null, .f = function(x) NA)))
Unit: microseconds
expr min lq mean median uq max neval cld
rrapply 25.401 31.801 60.92102 51.2510 58.3010 1053.502 100 a
parse 225.001 269.701 327.31600 329.1005 362.4505 687.800 100 b
dplyr 2942.501 3207.301 3604.63105 3500.0005 3766.1510 6541.402 100 c

关于r - 如何将 <NULL> 单元格重新编码为 tibble 列表列中的嵌套 NA (<lgl [1]>)?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66583913/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com