gpt4 book ai didi

r - 在 R 中使用 igraph 进行引文网络中的主要路径分析

转载 作者:行者123 更新时间:2023-12-05 01:07:09 32 4
gpt4 key购买 nike

是否有人熟悉在 R 中使用 igraph 实现主路径分析(Hummon 和 Doreian 1989)的方法?

这是来自原始 Hummon 和 Doreian 文章的示例。它跟踪 40 篇关于 DNA 的期刊文章的引用。箭头随着时间向前移动(旧文章的信息“流向”新文章)。

dna_edges <- data.frame(from=c(1,2,3,3,3,5,6,9,12,12,15,15,10,11,11,13,14,14,14,16,16,17,19,19,19,19,19,20,20,20,20,24,24,21,21,23,22,26,27,29,30,31,31,32,32,32,33,33,35,35,36,36,36),
to=c(8,18,4,5,21,12,9,12,15,29,29,22,17,13,20,20,16,20,31,17,20,34,20,24,25,21,25,31,22,30,22,28,37,22,32,27,27,27,32,32,40,32,40,36,38,33,32,35,38,39,38,39,40))

dna_g <- graph_from_data_frame(dna_edges, directed=T)
plot(dna_g,
layout=layout_with_sugiyama(dna_g,
layers = V(dna_g)$name)$layout)

enter image description here

Liu et al (2019)解释在引文网络中节点可以是三件事之一:

  1. 来源:被引用但没有人引用
  2. 汇:引用他人但从未被引用
  3. 中间体:引用和被引用

所以在这个例子中,我们有 10 篇文章是“来源”,另外 10 篇文章是“汇”:

dna_sources <- V(dna_g)$name[which(degree(dna_g, mode="in")==0)] # sources
[1] "8" "18" "4" "34" "25" "28" "37" "40" "38" "39"
dna_sinks <- V(dna_g)$name[which(degree(dna_g, mode="out")==0)] # sinks
[1] "1" "2" "3" "6" "10" "11" "14" "19" "23" "26"

主路径是将源连接到接收器的最常用路径。搜索路径计数 (SPC) 是其中一种方法。

"A citation link’s SPC is the number of times the link is traversed ifone runs through all the possible citation chains from all the sourcesto all the sinks in a citation network. To find SPC for a specificlink, one needs to enumerate all the possible citation chains thatemanate from all the sources and terminate at all the sinks" (Liu et al. 2019: 381)

因此看来,为了继续进行,需要 (i) 选择一个源-汇对,(ii) 找到连接这两个节点的所有路径,并在每条边被交叉时添加 +1 权重,(iii ) 对其他源-汇对重复。

关于如何执行 (i) 到 (iii) 的任何想法?

最佳答案

SPC

以下函数实现了SPC。

spc <- function(g) {
linegraph <- make_line_graph(g)
source_edges <- V(linegraph)[degree(linegraph, mode = "in") == 0]
sink_edges <- V(linegraph)[degree(linegraph, mode = "out") == 0]
tabulate(
unlist(
lapply(
source_edges,
all_simple_paths,
graph = linegraph,
to = sink_edges,
mode = "out")))
}

主路径搜索

以下函数查找主路径。请注意,如果有多个主路径具有相同的总 SPC 值,则可能还有其他主路径。此函数返回它找到的第一个主路径。

main_search <- function(g) {
linegraph <- make_line_graph(g)
V(linegraph)$spc <- spc(g)
source_edges <- V(linegraph)[degree(linegraph, mode = "in") == 0]
sink_edges <- V(linegraph)[degree(linegraph, mode = "out") == 0]
paths <- unlist(
lapply(
source_edges,
all_simple_paths,
graph = linegraph,
to = sink_edges,
mode = "out"),
recursive = FALSE)
path_lengths <- unlist(lapply(paths, function (x) sum(x$spc)))
vertex_attr(linegraph, "main_path") <- 0
vertex_attr(
linegraph,
"main_path",
paths[[which(path_lengths == max(path_lengths))[[1]]]]) <- 1
V(linegraph)$main_path
}

测试

Wikipedia main path figure

The Wikipedia article对于主路径分析,有一个图的图形,其中 SPC 值附加到所有边。您可以看到上图的副本。我将此图转录为 R,包括预期的 SPC 值和(全局)主路径。

library(tibble)

wikipedia_g <- graph_from_data_frame(
tibble::tribble(
~from, ~to, ~expected_spc, ~expected_main_path
"A", "C", 2, 0,
"B", "C", 2, 0,
"B", "D", 5, 1,
"B", "J", 1, 0,
"C", "E", 2, 0,
"C", "H", 2, 0,
"D", "F", 3, 1,
"D", "I", 2, 0,
"J", "M", 1, 0,
"E", "G", 2, 0,
"F", "H", 1, 0,
"F", "I", 2, 1,
"G", "H", 2, 0,
"I", "L", 2, 0,
"I", "M", 2, 1,
"H", "K", 5, 0,
"M", "N", 3, 1),
directed = TRUE)

期望 spc 函数输出的所有值都等于 expected_spc 值,就是这种情况。同样,expected_main_path 的值应该与 main_search 的输出相匹配,情况也是如此。

all(E(wikipedia_g)$expected_spc == spc(wikipedia_g))
# TRUE
all(E(wikipedia_g)$expected_main_path == main_search(wikipedia_g))
# TRUE

关于r - 在 R 中使用 igraph 进行引文网络中的主要路径分析,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67792685/

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com