gpt4 book ai didi

R有效地计算三位数字组合的频率

转载 作者:行者123 更新时间:2023-12-04 12:35:35 24 4
gpt4 key购买 nike

我有一个 data.frame,其中每个 ID 都有 3 个属性。为了简化,我只放了 100 行,尽管在我的真实数据集中它大约是 1.000.000。大约有 50 种不同的可能属性。属性是数字和字符的混合体。

data <- data.frame(id = 1:100,
a1 = sample(letters,100,replace = T),
a2 = sample(letters,100,replace = T),
a3 = sample(letters,100,replace = T),
stringsAsFactors=FALSE) %>%
as_tibble()

我想知道最频繁的组合是什么(顺序无所谓)

所以结果应该是这样的

pattern | frequency
a,a,a | 10
A,b,c | 5
a,e,c | 4
... | ....

首先我开始创建一个包含所有可能组合的向量:

possible_combinations <- combn(c(letters,LETTERS),3) %>% 
t() %>%
as_tibble() %>%
unite("combination",sep="") %>%
pull()

然后我写了这个嵌套循环来计算频率:

 counter = 0
inner_counter = 0
combination_counter = vector(mode = "numeric",length = length (possible_combinations))

for (j in 1:length(possible_combinations)){
for (i in 1:nrow(data)){

# inner Counter Counts when Attribute of one ID is in one combination
inner_counter = inner_counter + str_count(possible_combinations[j] , data[[i,2]] )
inner_counter = inner_counter + str_count(possible_combinations[j] , data[[i,3]] )
inner_counter = inner_counter + str_count(possible_combinations[j] , data[[i,4]] )

# if all three attributes are in a combination, then the Counter increases by one
if(inner_counter == 3) {
counter = counter + 1 }
inner_counter = 0
}

# combination_counter is a vector which saves the frequency with
# which a combination ocurred in all different ids

combination_counter[[j]] = inner_counter
inner_counter = 0
}

我知道这真的不是很喜欢 R,但我不知道如何以不同的方式做到这一点。运行时对于我的小玩具示例来说甚至很糟糕,而且对于我的真实数据来说几乎是不可行的。

最佳答案

你也可以用 base r 做这个:

table(apply(data[,2:4], 1, function(x) paste0(sort(x), collapse = ",")))

关于R有效地计算三位数字组合的频率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53503909/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com