gpt4 book ai didi

r - 在数据表中查找数据并将其添加到新列

转载 作者:行者123 更新时间:2023-12-02 09:27:31 24 4
gpt4 key购买 nike

我有两个数据表,如下所示:
二元组

 w1w2           freq   w1          w2      
common names 1 common names
department of 4 department of
family name 6 family name

bigrams = setDT(structure(list(w1w2 = c("common names", "department of", "family name"
), freq = c(1L, 4L, 6L), w1 = c("common", "department", "family"
), w2 = c("names", "of", "name")), .Names = c("w1w2", "freq",
"w1", "w2"), row.names = c(NA, -3L), class = "data.frame"))

一元语法

w1            freq  
common 2
department 3
family 4
name 5
names 1
of 9

unigrams = setDT(structure(list(w1 = c("common", "department", "family", "name",
"names", "of"), freq = c(2L, 3L, 4L, 5L, 1L, 9L)), .Names = c("w1",
"freq"), row.names = c(NA, -6L), class = "data.frame"))

所需输出

 w1w2           freq   w1          w2      w1freq    w2freq  
common names 1 common names 2 1
department of 4 department of 3 9
family name 6 family name 4 5

到目前为止我做了什么

setkey(bigrams, w1)
setkey(unigrams, w1)
result <- bigrams[unigrams]

这为我提供了 w1i.freq 列,但是当我尝试对 w2 执行相同操作时,i.freq 列出现了。 freq 列已更新以反射(reflect) w2 的频率。

如何在单独的列中获取 w1w2 的频率?

注意:我已经看到了 data.table Lookup value and translate 的解决方案和 Modify column of a data.table based on another column and add the new column

最佳答案

您可以执行两次联接,并且在 data.table v1.9.6 中,您可以为不同的列名称指定 on= 参数。

library(data.table)

bigrams[unigrams, on=c("w1"), nomatch = 0][unigrams, on=c(w2 = "w1"), nomatch = 0]

w1w2 freq w1 w2 i.freq i.freq.1
1: family name 6 family name 4 5
2: common names 1 common names 2 1
3: department of 4 department of 3 9

关于r - 在数据表中查找数据并将其添加到新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36588019/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com