gpt4 book ai didi

r - Quanteda Kwic将数据附加到输出

转载 作者:行者123 更新时间:2023-12-04 04:16:04 25 4
gpt4 key购买 nike

我想将一些元数据附加到kwic输出中,例如客户ID(请参见下文),以便可以轻松地对主文件进行查找。我尝试使用cbind附加数据,但没有正确匹配的内容。

如果可能的话,将不胜感激。

     docname    position    contextPre      keyword    contextPost          CustID
text3790 5 nothing at all looks good and sounds great 1
text3801 11 think the offer is a good value and has a lot 3
text3874 10 not so sure thats a good word to use 5

原始data.frame
       CustID   Comment
1 nothing at all looks good and sounds great
2 did not see anything that was very appealing
3 I think the offer is a good value and has a lot of potential
4 these items look terrible how are you still in business
5 not so sure thats a good word to use
6 having a hard time believing some place would sell an item so low
7 it may be worth investing in some additional equipment

最佳答案

起初,我认为理想的解决方案是使用docvars,但kwic似乎没有选择显示它们。我仍然需要将id-doc映射表与kwic结果合并。

library(data.table)
library(quanteda)

s <- "CustID, Comment
1, nothing at all looks good and sounds great
2, did not see anything that was very appealing
3, I think the offer is a good value and has a lot of potential
4, these items look terrible how are you still in business
5, not so sure thats a good word to use
6, having a hard time believing some place would sell an item so low
7, it may be worth investing in some additional equipment"

# I'm using data.table mainly to read the data easily.
dt <- fread(s, data.table=FALSE)

# all operations below apply to data frame
myCorpus <- corpus(df$Comment)
# the Corpus and CustID came from same data frame,
# thus ensured the mapping is correct
docvars(myCorpus, "CustID") <- df$CustID
summary(myCorpus)
# build the mapping table of docname and CustID.
# The docname is in row.names, have to make an explicit column
dv_table <- docvars(myCorpus)
id_table <- data.frame(docname = row.names(dv_table), CustID = dv_table$CustID)
result <- kwic(myCorpus, "good", window = 3, valuetype = "glob")
id_result <- merge(result, id_table, by = "docname")

结果:
> id_result
docname position contextPre keyword contextPost CustID
1 text1 5 at all looks good and sounds great 1
2 text3 7 offer is a good value and has 3
3 text5 6 sure thats a good word to use 5

关于r - Quanteda Kwic将数据附加到输出,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39457841/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com