gpt4 book ai didi

R:K 均值聚类与社区检测算法(加权相关网络)- 我是否将这个问题过于复杂?

转载 作者:行者123 更新时间:2023-12-04 11:26:25 25 4
gpt4 key购买 nike

我有如下所示的数据:https://imgur.com/a/1hOsFpF
第一个数据集是标准格式数据集,其中包含人员及其财务属性的列表。
第二个数据集包含这些人之间的“关系”——他们互相支付了多少,以及他们彼此欠了多少。
我有兴趣了解更多关于网络和基于图的聚类 - 但我试图更好地了解什么类型的情况需要基于网络的聚类,即我不想在不需要的地方使用图聚类(避免“方钉圆孔"类型情况)。
使用 R,首先我创建了一些假数据:

library(corrr)
library(dplyr)
library(igraph)
library(visNetwork)
library(stats)

# create first data set

Personal_Information <- data.frame(

"name" = c("John", "Jack", "Jason", "Jim", "Julian", "Jack", "Jake", "Joseph"),

"age" = c("41","33","24","66","21","66","29", "50"),

"salary" = c("50000","20000","18000","66000","77000","0","55000","40000"),

"debt" = c("10000","5000","4000","0","20000","5000","0","1000"

)


Personal_Information$age = as.numeric(Personal_Information$age)
Personal_Information$salary = as.numeric(Personal_Information$salary)
Personal_Information$debt = as.numeric(Personal_Information$debt)
create second data set
Relationship_Information <-data.frame(

"name_a" = c("John","John","John","Jack","Jack","Jack","Jason","Jason","Jim","Jim","Jim","Julian","Jake","Joseph","Joseph"),
"name_b" = c("Jack", "Jason", "Joseph", "John", "Julian","Jim","Jim", "Joseph", "Jack", "Julian", "John", "Joseph", "John", "Jim", "John"),
"how_much_they_owe_each_other" = c("10000","20000","60000","10000","40000","8000","0","50000","6000","2000","10000","10000","50000","12000","0"),
"how_much_they_paid_each_other" = c("5000","40000","120000","20000","20000","8000","0","20000","12000","0","0","0","50000","0","0")
)

Relationship_Information$how_much_they_owe_each_other = as.numeric(Relationship_Information$how_much_they_owe_each_other)
Relationship_Information$how_much_they_paid_each_other = as.numeric(Relationship_Information$how_much_they_paid_each_other)
然后,我运行了一个标准的 K-Means 聚类算法(在第一个数据集上)并绘制了结果:
# Method 1 : simple k means analysis with 2 clusters on Personal Information dataset
cl <- kmeans(Personal_Information[,c(2:4)], 2)
plot(Personal_Information, col = cl$cluster)
points(cl$centers, col = 1:2, pch = 8, cex = 2)
这就是我通常会如何处理这个问题。现在,我想看看我是否可以对此类问题使用图聚类。
首先,我创建了一个加权相关网络( http://www.sthda.com/english/articles/33-social-network-analysis/136-network-analysis-and-manipulation-using-r/ )
首先,我创建了加权相关网络(使用第一个数据集):
res.cor <- Personal_Information[, c(2:4)] %>%  
t() %>% correlate() %>%
shave(upper = TRUE) %>%
stretch(na.rm = TRUE) %>%
filter(r >= 0.8)

graph <- graph.data.frame(res.cor, directed=F)
graph <- simplify(graph)
plot(graph)
然后,我运行了图聚类算法:
#run graph clustering (also called communiy dectection) on the correlation network
fc <- fastgreedy.community(graph)
V(graph)$community <- fc$membership
nodes <- data.frame(id = V(graph)$name, title = V(graph)$name, group = V(graph)$community)
nodes <- nodes[order(nodes$id, decreasing = F),]
edges <- get.data.frame(graph, what="edges")[1:2]

visNetwork(nodes, edges) %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE)
这似乎有效 - 但我不确定这是否是解决这个问题的最佳方式。
有人可以提供一些建议吗?我把这个问题复杂化了吗?
谢谢

最佳答案

也许您可能有兴趣阅读“基于融合的社区检测方法”(https://link.springer.com/chapter/10.1007/978-3-030-44584-3_24)。这些基于融合的方法显然是专门设计来考虑节点属性的。
这也可能有帮助:https://www.nature.com/articles/srep30750

关于R:K 均值聚类与社区检测算法(加权相关网络)- 我是否将这个问题过于复杂?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64849921/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com