gpt4 book ai didi

r - R: `split`保留因子的自然顺序

转载 作者:行者123 更新时间:2023-12-03 22:08:21 24 4
gpt4 key购买 nike

split将始终按字典顺序对拆分进行排序。在某些情况下,人们宁愿保留自然秩序。总是可以实现手动滚动功能,但是是否有基本的R解决方案可以做到这一点?

可重现的示例:

输入:

  Date.of.Inclusion Securities.Included Securities.Excluded yearmon
1 2013-04-01 INDUSINDBK SIEMENS 4 2013
2 2013-04-01 NMDC WIPRO 4 2013
3 2012-09-28 LUPIN SAIL 9 2012
4 2012-09-28 ULTRACEMCO STER 9 2012
5 2012-04-27 ASIANPAINT RCOM 4 2012
6 2012-04-27 BANKBARODA RPOWER 4 2012
split输出:
R> split(nifty.dat, nifty.dat$yearmon)
$`4 2012`
Date.of.Inclusion Securities.Included Securities.Excluded yearmon
5 2012-04-27 ASIANPAINT RCOM 4 2012
6 2012-04-27 BANKBARODA RPOWER 4 2012

$`4 2013`
Date.of.Inclusion Securities.Included Securities.Excluded yearmon
1 2013-04-01 INDUSINDBK SIEMENS 4 2013
2 2013-04-01 NMDC WIPRO 4 2013

$`9 2012`
Date.of.Inclusion Securities.Included Securities.Excluded yearmon
3 2012-09-28 LUPIN SAIL 9 2012
4 2012-09-28 ULTRACEMCO STER 9 2012

请注意, yearmon已经按照我想要的特定顺序进行了排序。可以认为这是给定的,因为如果这个问题不成立,则问题可能会被错误指定。

所需的输出:
$`4 2013`
Date.of.Inclusion Securities.Included Securities.Excluded yearmon
1 2013-04-01 INDUSINDBK SIEMENS 4 2013
2 2013-04-01 NMDC WIPRO 4 2013

$`9 2012`
Date.of.Inclusion Securities.Included Securities.Excluded yearmon
3 2012-09-28 LUPIN SAIL 9 2012
4 2012-09-28 ULTRACEMCO STER 9 2012

$`4 2012`
Date.of.Inclusion Securities.Included Securities.Excluded yearmon
5 2012-04-27 ASIANPAINT RCOM 4 2012
6 2012-04-27 BANKBARODA RPOWER 4 2012

谢谢。

PS:我知道有更好的方法来创建 yearmon来保留该顺序,但是我正在寻找一种通用的解决方案。

最佳答案

splitf(第二个)参数转换为因子(如果尚不为它)。因此,如果您希望保留顺序,请自行将列与所需水平对应。那是:

df$yearmon <- factor(df$yearmon, levels=unique(df$yearmon))
# now split
split(df, df$yearmon)
# $`4_2013`
# Date.of.Inclusion Securities.Included Securities.Excluded yearmon
# 1 2013-04-01 INDUSINDBK SIEMENS 4_2013
# 2 2013-04-01 NMDC WIPRO 4_2013

# $`9_2012`
# Date.of.Inclusion Securities.Included Securities.Excluded yearmon
# 3 2012-09-28 LUPIN SAIL 9_2012
# 4 2012-09-28 ULTRACEMCO STER 9_2012

# $`4_2012`
# Date.of.Inclusion Securities.Included Securities.Excluded yearmon
# 5 2012-04-27 ASIANPAINT RCOM 4_2012
# 6 2012-04-27 BANKBARODA RPOWER 4_2012

但不要使用 split。使用 data.table代替:

但是,通常,随着级别的增加, split会变得非常缓慢。因此,我建议使用 data.table将其子集到列表中。我想那会快得多!
require(data.table)
dt <- data.table(df)
dt[, grp := .GRP, by = yearmon]
setkey(dt, grp)
o2 <- dt[, list(list(.SD)), by = grp]$V1

对海量数据进行基准测试:
set.seed(45)
dates <- seq(as.Date("1900-01-01"), as.Date("2013-12-31"), by = "days")
ym <- do.call(paste, c(expand.grid(1:500, 1900:2013), sep="_"))

df <- data.frame(x1 = sample(dates, 1e4, TRUE),
x2 = sample(letters, 1e4, TRUE),
x3 = sample(10, 1e4, TRUE),
yearmon = sample(ym, 1e4, TRUE),
stringsAsFactors=FALSE)

require(data.table)
dt <- data.table(df)

f1 <- function(dt) {
dt[, grp := .GRP, by = yearmon]
setkey(dt, grp)

o1 <- dt[, list(list(.SD)), by=grp]$V1
}

f2 <- function(df) {
df$yearmon <- factor(df$yearmon, levels=unique(df$yearmon))
o2 <- split(df, df$yearmon)
}

require(microbenchmark)
microbenchmark(o1 <- f1(dt), o2 <- f2(df), times = 10)

# Unit: milliseconds
expr min lq median uq max neval
# o1 <- f1(dt) 43.72995 43.85035 45.20087 715.1292 1071.976 10
# o2 <- f2(df) 4485.34205 4916.13633 5210.88376 5763.1667 6912.741 10

请注意, o1的解决方案将是一个未命名的列表。但是您可以简单地通过 names(o1) <- unique(dt$yearmon)来设置名称

关于r - R: `split`保留因子的自然顺序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17611734/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com