gpt4 book ai didi

r - 在 R 中组合列表的行

转载 作者:行者123 更新时间:2023-12-04 12:27:47 24 4
gpt4 key购买 nike

我有一个格式列表:

[[1]]
[1] "10" "719" "99"

[[2]]
[1] "10" "624" "85" "888" "624"

[[3]]
[1] "1" "894" "110" "344" "634"

我想通过列表中第一个元素的唯一值进行合并,即。
[[1]]
[1] "10" "719" "99" "624" "85" "888" "624"

[[2]]
[1] "1" "894" "110" "344" "634"

有没有办法以最少的内存使用来做到这一点?

最佳答案

我会这样处理:

x <- list(c("10",  "719", "99"),
c("10", "624", "85", "888", "624"),
c("1", "894", "110", "344", "634"))
first_elems <- sapply(x, "[", 1) # get 1st elem of each vector
(first_elems <- as.factor(first_elems)) # factorize (i.a. find all unique elems)
## [1] 10 10 1
## Levels: 1 10
(group <- split(x, first_elems)) # split by 1st elem (divide into groups)
## $`1`
## $`1`[[1]]
## [1] "1" "894" "110" "344" "634"
##
##
## $`10`
## $`10`[[1]]
## [1] "10" "719" "99"
##
## $`10`[[2]]
## [1] "10" "624" "85" "888" "624"
##
(result <- lapply(group, unlist)) # combine vectors in each group (list of vectors -> an atomic vector)
## $`1`
## [1] "1" "894" "110" "344" "634"
##
## $`10`
## [1] "10" "719" "99" "10" "624" "85" "888" "624"

编辑 :对于非重复 key ,请使用:
(result <- lapply(group, function(x) {
c(x[[1]][1], unlist(lapply(x, "[", -1)))
}))
## $`1`
## [1] "1" "894" "110" "344" "634"
##
## $`10`
## [1] "10" "719" "99" "624" "85" "888" "624"

不需要太多额外的内存。除了结果列表,我们需要存储 as.factor 的结果(类数 + 元素数 x )。 split需要很少的额外内存 - x 中的向量没有深度复制。

至于性能,对于相当大的列表:
set.seed(1L)
n <- 100000
x <- vector('list', n)
for (i in 1:n)
x[[i]] <- as.character(sample(1:1000, ceiling(runif(1, 1, 1000)), replace=TRUE))
object.size(x) # 2GB
## 2175165880 bytes

我在我的旧 Linux 笔记本电脑上获得了以下运行时间:
system.time(local({
first_elems <- as.factor(sapply(x, "[", 1))
group <- split(x, first_elems)
result <- lapply(group, function(x) {
c(x[[1]][1], unlist(lapply(x, "[", -1)))
})
}))

## user system elapsed
## 4.119 0.001 4.149

看起来很有道理,我想。

关于r - 在 R 中组合列表的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23597552/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com