- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我试图从文档中了解data.table
中的逻辑,并且有点不清楚。我知道我可以尝试一下,看看会发生什么,但我想确保没有病理情况,因此想知道逻辑的实际编码方式。当两个data.table
对象的键列数不同时,例如a
的键数为2,而b
的键数为3,并且您运行c <- a[b]
,a
和b
会简单地合并在前两个键列上,还是会自动合并a中的第三列到b
中的第三个键列?一个例子:
require(data.table)
a <- data.table(id=1:10, t=1:20, v=1:40, key=c("id", "t"))
b <- data.table(id=1:10, v2=1:20, key="id")
c <- a[b]
a
中的
id
键列匹配的
b
行。例如,对于
id==1
中的
b
,
b
中有2行,
a
中有4行,应该在
c
中生成8行。确实确实是这样:
> head(c,10)
id t v v2
1: 1 1 1 1
2: 1 1 21 1
3: 1 11 11 1
4: 1 11 31 1
5: 1 1 1 11
6: 1 1 21 11
7: 1 11 11 11
8: 1 11 31 11
9: 2 2 2 2
10: 2 2 22 2
d <-b[a]
a
中的每一行,都应该选择
b
中的匹配行:由于
a
包含一个额外的键列
t
,因此该列不应用于匹配,而只能基于第一个键列进行联接,
id
应该完成。似乎是这种情况:
> head(d,10)
id v2 t v
1: 1 1 1 1
2: 1 11 1 1
3: 1 1 1 21
4: 1 11 1 21
5: 1 1 11 11
6: 1 11 11 11
7: 1 1 11 31
8: 1 11 11 31
9: 2 2 2 2
10: 2 12 2 2
a
的第三键列,还是
data.table
仅使用两个表的
min(length(key(DT)))
。
最佳答案
好问题。首先,正确的术语是(来自?data.table
):
[A data.table] may have one key of one or more columns. This key can be used for row indexing instead of rownames.
?data.table
:
When i is a data.table, x must have a key. i is joined to x using x's key and the rows in x that match are returned. An equi-join is performed between each column in i to each column in x's key; i.e., column 1 of i is matched to the 1st column of x's key, column 2 to the second, etc. The match is a binary search in compiled C in O(log n) time. If i has fewer columns than x's key then many rows of x will ordinarily match to each row of i since not all of x's key columns will be joined to (a common use case). If i has more columns than x's key, the columns of i not involved in the join are included in the result. If i also has a key, it is i's key columns that are used to match to x's key columns (column 1 of i's key is joined to column 1 of x's key, column 2 to column 2, and so on) and a binary merge of the two tables is carried out. In all joins the names of the columns are irrelevant. The columns of x's key are joined to in order, either from column 1 onwards of i when i is unkeyed, or from column 1 onwards of i's key.
When i is a data.table, x must have a key. i is joined to x using x's key and the rows in x that match are returned. An equi-join is performed between each column in i to each column in x's key; i.e., column 1 of i is matched to the 1st column of x's key, column 2 to the second, etc. The match is a binary search in compiled C in O(log n) time. If i has fewer columns than x's key then not all of x's key columns will be joined to (a common use case) and many rows of x will (ordinarily) match to each row of i. If i has more columns than x's key, the columns of i not involved in the join are included in the result. If i also has a key, it is i's key columns that are used to match to x's key columns (column 1 of i's key is joined to column 1 of x's key, column 2 of i's key to column 2 of x's key, and so on for as long as the shorter key) and a binary merge of the two tables is carried out. In all joins the names of the columns are irrelevant; the columns of x's key are joined to in order, either from column 1 onwards of i when i is unkeyed, or from column 1 onwards of i's key. In code, the number of join columns is determined by min(length(key(x)),if (haskey(i)) length(key(i)) else ncol(i)).
关于r - 当键列数不同时合并data.table,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12920803/
我有几个长度不等的 vector ,我想对其进行cbind。我将 vector 放入列表中,并尝试结合使用do.call(cbind, ...): nm <- list(1:8, 3:8, 1:5)
合并(合并)两个 JSONObjects 的最佳方式是什么? JSONObject o1 = { "one": "1", "two": "2", "three": "3" }
我在一个表中有许多空间实体,其中有一个名为 Boundaries 的 geometry 字段。我想生成一个具有简化形状/几何图形的 GeoJson 文件。 这是我的第一次尝试: var entitie
谁能说出为什么这个选择返回 3.0 而不是 3.5: SELECT coalesce(1.0*(7/2),0) as foo 这个返回 3: SELECT coalesce(7/2,0) as foo
首先抱歉,也许这个问题已经提出,但我找不到任何可以帮助我的东西,可能是因为我对 XSLT 缺乏了解。 我有以下 XML: 0 OK
有时用户会使用 Windows 资源管理器复制文件并在他们应该执行 svn 存储库级别的复制或合并时提交它们。因此,SVN 没有正确跟踪这些变化。一旦我发现这一点,损坏显然已经完成,并且可能已经对相关
我想组合/堆叠 2 个不同列的值并获得唯一值。 如果范围相邻,则可以正常工作。例如: =UNIQUE(FILTERXML(""&SUBSTITUTE(TEXTJOIN(",",TRUE,TRANSPO
使用iTextSharp,如何将多个PDF合并为一个PDF,而又不丢失每个PDF中的“表单字段”及其属性? (我希望有一个使用来自数据库的流的示例,但文件系统也可以) 我发现this code可以正常
是否有一个合并函数可以优先考虑公共(public)变量中的非缺失值? 考虑以下示例。 首先,我们生成两个 data.frames,它们具有相同的 ID,但在特定变量上有互补的缺失值: set.seed
我们正在尝试实现 ALM Rangers 在最新的 Visual Studio TFS Branching and Merging Guide 中描述的“基本双分支计划”。 .从指导: The bas
我在不同目录(3个不同名称)中有很多(3个只是一个例子)文本文件,如下所示: 目录:A,文件名:run.txt 格式:txt制表符分隔 ; file one 10 0.2 0.5 0.
我有一张包含学生等级关系的表: Student Grade StartDate EndDate 1 1 09/01/2009 NULL 2
我在学习 https://www.doctrine-project.org/projects/doctrine-orm/en/2.6/reference/working-with-associatio
我觉得我有世界上最简单的 SVN 用例: 我有一个文件,Test.java在 trunk SVN的。 我分行trunk至 dev-branch . 我搬家Test.java进入 com/mycompa
我有两个数据框,其中一些列名称相同,而另一些列名称不同。数据框看起来像这样: df1 ID hello world hockey soccer 1 1 NA NA
Elasticsearch 中是否缺少以扁平化形式(多个子/子aggs)返回结果的方法? 例如,当前我正在尝试获取所有产品类型及其状态(在线/离线)。 这就是我最终得到的: aggs [ { key:
如何合并如下所示的 map : Map1 = Map(1 -> Class1(1), 2 -> Class1(2)) Map2 = Map(2 -> Class2(1), 3 -> Class2(2)
我试图通过从netezza服务器导入数据来合并两个数据集。 以下是数据集,其数字为,ID为,字母为,名称为: 下表都是使用命令从netezza导入的: sqoop import --connect n
我有两个数组 $array1 = array('first', 'second', 'third', 'fourth'); $array2 = array('first', 'third', 'fou
我正在 SQL Server 中运行合并。在我的更新中,我只想在值发生更改时更新该行。有一个版本行在每次更新时都会递增。下面是一个例子: MERGE Employee as tgt USING (SE
我是一名优秀的程序员,十分优秀!