gpt4 book ai didi

R:如何快速地从一个非常大的表格中选择两列中的常用单词或相同数字?

转载 作者:行者123 更新时间:2023-12-02 02:51:12 24 4
gpt4 key购买 nike

我有一个非常大的表 (1,000,000 X 20) 需要处理并且需要快速处理。

例如,我的表中有 2 列 X2 和 X3:

enter image description here

    X1  X2                                          X3
c1 1 100020003001, 100020003002, 100020003003 100020003001, 100020003002, 100020003004
c2 2 100020003001, 100020004002, 100020004003 100020003001, 100020004007, 100020004009
c3 3 100050006003, 100050006001, 100050006001 100050006011, 100050006013, 100050006021

现在我想创建 2 个新列,其中包含

1)常用词或相同数字

例如:[1] "100020003001""100020003002"

2) 常用词或相同数字的个数

例如:[1] 2

我试过下面线程的方法,但是,处理时间很慢,因为我是用 f​​or 循环做的:

Count common words in two strings

 library(stringi)
Reduce(`intersect`,stri_extract_all_regex(vec1,"\\w+"))

感谢您的帮助!我真的在这里挣扎......

最佳答案

我们可以通过,拆分'X2','X3'列,得到对应list元素的intersectmap2 并使用 lengths 来“计算”list

中的元素数量
library(tidyverse)
df1 %>%
mutate(common_words = map2(strsplit(X2, ", "),
strsplit(X3, ", "),
intersect),
count = lengths(common_words))
# X1 X2 X3
#1 1 100020003001, 100020003002, 100020003003 100020003001, 100020003002, 100020003004
#2 2 100020003001, 100020004002, 100020004003 100020003001, 100020004007, 100020004009
#3 3 100050006003, 100050006001, 100050006001 100050006011, 100050006013, 100050006021
# common_words count
#1 100020003001, 100020003002 2
#2 100020003001 1
#3 0

或者使用base R

df1$common_words <- Map(intersect, strsplit(df1$X2, ", "), strsplit(df1$X3, ", "))
df1$count <- lengths(df1$common_words)

数据

df1 <- structure(list(X1 = 1:3, X2 = c("100020003001, 100020003002, 100020003003", 
"100020003001, 100020004002, 100020004003", "100050006003,
100050006001, 100050006001"
), X3 = c("100020003001, 100020003002, 100020003004", "100020003001,
100020004007, 100020004009",
"100050006011, 100050006013, 100050006021")), class = "data.frame",
row.names = c("c1", "c2", "c3"))

关于R:如何快速地从一个非常大的表格中选择两列中的常用单词或相同数字?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52142521/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com