gpt4 book ai didi

r - 如果索引列名是连接列名的前缀,Data.table join with index 会产生意外的结果

转载 作者:行者123 更新时间:2023-12-04 12:51:37 28 4
gpt4 key购买 nike

对于两个 data.tables 的特定设置,连接不会提供我期望的结果。我在我的代码中犯了错误还是可能是 data.table 问题?

请看下面的例子。

library(data.table)

# In the code below the join does not deliver the result I would expect
DT1 <- data.table(colname=c("test1","test2","test2","test3"), colname_with_suffix=c("other","test","includes test within","other"))
DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3))
DT1[colname_with_suffix == "not found", ] # automatically creates index on colname_with_suffix
DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][]
# PLEASE NOTE: same result with slightly different syntax: DT1[DT2, lookup_result := i.lookup_result, on=c(colname="lookup")][]
# colname colname_with_suffix lookup_result
# 1: test1 other NA
# 2: test2 test NA
# 3: test2 includes test within NA
# 4: test3 other 3


# Expected result:
# colname colname_with_suffix lookup_result
# 1: test1 other 1
# 2: test2 test 2
# 3: test2 includes test within 2
# 4: test3 other 3

对于以下变体,连接按预期工作。上面的意外行为似乎只发生在列名作为连接列名的前缀并且两者都具有相似文本内容的列上存在索引时。
# For all following alternatives the join delivers the correct result

# (a) Same data tables as above, but no index
DT1 <- data.table(colname=c("test1","test2","test2","test3"), colname_with_suffix=c("other","test","includes test within","other"))
DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3))
DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][]

# (b) Index on DT2, but completely different values in indexed column than in join column
DT1 <- data.table(colname=c("test1","test2","test2","test3"), colname_with_suffix=c("other","other","other","other"))
DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3))
DT1[colname_with_suffix == "not found", ] # automatically creates index on colname_with_suffix
DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][]

# (c) Index on DT2, similar values in indexed column, but indexed column name is not a prefix of join column name
DT1 <- data.table(colname=c("test1","test2","test2","test3"), x.colname_with_suffix=c("other","test","includes test within","other"))
DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3))
DT1[x.colname_with_suffix == "not found", ] # automatically creates index on x.colname_with_suffix
DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][]

session 信息:
# R version 3.3.2 (2016-10-31)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 7 x64 (build 7601) Service Pack 1
#
# locale:
# [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 LC_NUMERIC=C LC_TIME=German_Germany.1252
#
# attached base packages:
# [1] stats graphics grDevices utils datasets methods base
#
# other attached packages:
# [1] data.table_1.10.0
#
# loaded via a namespace (and not attached):
# [1] tools_3.3.2

请注意,Windows 下的 data.table 1.10.4 和 R.Version 3.4.2 以及 Ubuntu Linux 14.04 也会发生相同的行为。

最佳答案

This has been fixed in v1.11.0 @MarkusBonsch(回答这个问题,所以它不会在未回答列表中)

关于r - 如果索引列名是连接列名的前缀,Data.table join with index 会产生意外的结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46846122/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com