gpt4 book ai didi

python - 转换 tsv 文件,以便我可以将它用于 python 中的节点和边

转载 作者:行者123 更新时间:2023-12-01 05:18:04 25 4
gpt4 key购买 nike

我有这个 tsv 文件,我想读取并以某种方式计算路径中的节点数量

tsv 文件的各个部分如下所示:

  6a3701d319fc3754  1297740409  166  14th_century;15th_century;16th_century;Pacific_Ocean;Atlantic_Ocean;Accra;Africa;Atlantic_slave_trade;African_slave_trade  NULL
3824310e536af032 1344753412 88 14th_century;Europe;Africa;Atlantic_slave_trade;African_slave_trade 3

路径只有这样的:14世纪;15世纪;用“;”分隔

我的代码到目前为止:

import networkx as nx

fh = open("test.tsv", 'rb')
G = nx.read_edgelist("test.tsv", create_using=nx.DiGraph())
print G.nodes()
print G.edges()

所以我的问题是如何计算路径触及的节点数?

最佳答案

我在这里使用 pandas 库是为了提高速度,您可以使用 pip install pandas 进行安装,并在此处检查:http://pandas.pydata.org/

首先从您的示例代码构建我们的数据框:

In [39]:

temp = """6a3701d319fc3754 1297740409 166 14th_century;15th_century;16th_century;Pacific_Ocean;Atlantic_Ocean;Accra;Africa;Atlantic_slave_trade;African_slave_trade NULL

3824310e536af032 1344753412 88 14th_century;Europe;Africa;Atlantic_slave_trade;African_slave_trade 3"""

# construct the dataframe
# in your case replace io.String() with the path to your tsv file
df = pd.read_csv(io.StringIO(temp), sep='\s+', header=None, names=['a','b','c','d','e'])

df
Out[39]:

a b c \
0 6a3701d319fc3754 1297740409 166
1 3824310e536af032 1344753412 88

d e
0 14th_century;15th_century;16th_century;Pacific... NaN
1 14th_century;Europe;Africa;Atlantic_slave_trad... 3

[2 rows x 5 columns]

In [65]:

# use itertools to flatten our list of lists
import itertools

def to_edge_list(x):
# split on semi-colon
split_list = x.split(';')
#print(split_list)
# get our main node
primary_node = split_list[0]
# construct our edge list
edge_list=[]
# create a list comprehension from the split list
edge_list = [(primary_node, x) for x in split_list[1:] ]
#print(edge_list)
return edge_list

# now use itertools to flatten the list of lists into a single list
combined_edge_list = list(itertools.chain.from_iterable(df['d'].apply(to_edge_list)))

print(combined_edge_list)

[('14th_century', '15th_century'), ('14th_century', '16th_century'), ('14th_century', 'Pacific_Ocean'), ('14th_century', 'Atlantic_Ocean'), ('14th_century', 'Accra'), ('14th_century', 'Africa'), ('14th_century', 'Atlantic_slave_trade'), ('14th_century', 'African_slave_trade'), ('14th_century', 'Europe'), ('14th_century', 'Africa'), ('14th_century', 'Atlantic_slave_trade'), ('14th_century', 'African_slave_trade')]

# Now construct our networkx graph from the edge list
In [66]:

import networkx as nx

G = nx.MultiDiGraph()
G.add_edges_from(combined_edge_list)
G.edges()


Out[66]:

[('14th_century', '15th_century'),
('14th_century', 'Africa'),
('14th_century', 'Africa'),
('14th_century', 'Atlantic_slave_trade'),
('14th_century', 'Atlantic_slave_trade'),
('14th_century', 'African_slave_trade'),
('14th_century', 'African_slave_trade'),
('14th_century', '16th_century'),
('14th_century', 'Accra'),
('14th_century', 'Europe'),
('14th_century', 'Atlantic_Ocean'),
('14th_century', 'Pacific_Ocean')]

绘制图表(看起来不漂亮,但到底是什么):

enter image description here

关于python - 转换 tsv 文件,以便我可以将它用于 python 中的节点和边,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22884390/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com