gpt4 book ai didi

python - 构建 NetworkX 图时避免使用 NaN 属性

转载 作者:太空宇宙 更新时间:2023-11-04 04:16:27 25 4
gpt4 key购买 nike

我想使用 pandas 读取包含节点及其属性的 csv 文件。并非所有节点都具有每个属性,缺少的属性只是 csv 文件中缺少的。当 pandas 读取 csv 文件时,缺失值显示为 nan。我想从数据框中批量添加节点,但要避免添加 nan 的属性。

例如,这里有一个名为 mwe.csv 的示例 csv 文件:

Name,Cost,Depth,Class,Mean,SD,CST,SL,Time
Manuf_0001,39.00,1,Manuf,,,12,,10.00
Manuf_0002,36.00,1,Manuf,,,8,,10.00
Part_0001,12.00,2,Part,,,,,28.00
Part_0002,5.00,2,Part,,,,,15.00
Part_0003,9.00,2,Part,,,,,10.00
Retail_0001,0.00,0,Retail,253,36.62,0,0.95,0.00
Retail_0002,0.00,0,Retail,45,1,0,0.95,0.00
Retail_0003,0.00,0,Retail,75,2,0,0.95,0.00

这是我目前的处理方式:

import pandas as pd
import numpy as np
import networkx as nx

node_df = pd.read_csv('mwe.csv')

graph = nx.DiGraph()
graph.add_nodes_from(node_df['Name'])
nx.set_node_attributes(graph, dict(zip(node_df['Name'], node_df['Cost'])), 'nodeCost')
nx.set_node_attributes(graph, dict(zip(node_df['Name'], node_df['Mean'])), 'avgDemand')
nx.set_node_attributes(graph, dict(zip(node_df['Name'], node_df['SD'])), 'sdDemand')
nx.set_node_attributes(graph, dict(zip(node_df['Name'], node_df['CST'])), 'servTime')
nx.set_node_attributes(graph, dict(zip(node_df['Name'], node_df['SL'])), 'servLevel')

# Loop through all nodes and all attributes and remove NaNs.
for i in graph.nodes:
for k, v in list(graph.nodes[i].items()):
if np.isnan(v):
del graph.nodes[i][k]

它可以工作,但是很笨重。有没有更好的方法,例如在添加节点时避免 nan,而不是之后删除 nan

最佳答案

在这种情况下,您可以利用 Pandas 的力量来执行您的命令。因此,我创建了这个函数,它将具有两个键和值列的 DataFrame 转换为一个系列,然后删除具有 NaN 的元素,最后将其更改为字典

def create_node_attribs(key_col, val_col):
# Upto you if you want to pass the dataframe as argument
# In your case, since this was the only df, I only passed the columns
global node_df
return Series(node_df[val_col].values,
index=node_df[key_col]).dropna().to_dict()

完整代码如下

import pandas as pd
import networkx as nx
from pandas import Series

node_df = pd.read_csv('mwe.csv')

graph = nx.DiGraph()

def create_node_attribs(key_col, val_col):
# Upto you if you want to pass the dataframe as argument
# In your case, since this was the only df, I only passed the columns
global node_df
return Series(node_df[val_col].values,
index=node_df[key_col]).dropna().to_dict()

graph.add_nodes_from(node_df['Name'])
nx.set_node_attributes(graph, create_node_attribs('Name', 'Cost'), 'nodeCost')
nx.set_node_attributes(graph, create_node_attribs('Name', 'Mean'), 'avgDemand')
nx.set_node_attributes(graph, create_node_attribs('Name', 'SD'), 'sdDemand')
nx.set_node_attributes(graph, create_node_attribs('Name', 'CST'), 'servTime')
nx.set_node_attributes(graph, create_node_attribs('Name', 'SL'), 'servLevel')

链接到 Google Colab Notebook用代码。

此外,see this answer , 有关当前使用的方法的时间比较的更多信息。

关于python - 构建 NetworkX 图时避免使用 NaN 属性,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55314155/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com