gpt4 book ai didi

python - 将 GraphFrames ShortestPath Map 转换为 PySpark 中的 DataFrame 行

转载 作者:太空狗 更新时间:2023-10-30 02:41:58 24 4
gpt4 key购买 nike

我正在尝试找到最有效的方法来从 GraphFrames 函数 shortestPaths 获取 Map 输出并将每个顶点的距离映射展平到新 DataFrame 中的单独行中。通过将距离列拉入字典,然后从那里转换为 Pandas 数据帧,然后再转换回 Spark 数据帧,我已经能够非常笨拙地做到这一点,但我知道必须有更好的方法。

from graphframes import *

v = sqlContext.createDataFrame([
("a", "Alice", 34),
("b", "Bob", 36),
("c", "Charlie", 30),
], ["id", "name", "age"])

# Create an Edge DataFrame with "src" and "dst" columns
e = sqlContext.createDataFrame([
("a", "b", "friend"),
("b", "c", "follow"),
("c", "b", "follow"),
], ["src", "dst", "relationship"])

# Create a GraphFrame
g = GraphFrame(v, e)

results = g.shortestPaths(landmarks=["a", "b","c"])
results.select("id","distances").show()

+---+--------------------+
| id| distances|
+---+--------------------+
| a|Map(a -> 0, b -> ...|
| b| Map(b -> 0, c -> 1)|
| c| Map(c -> 0, b -> 1)|
+---+--------------------+

我想要的是采用上面的输出并拉平距离,同时将 id 保持为如下所示:

+---+---+---------+      
| id| v | distance|
+---+---+---------+
| a| a | 0 |
| a| b | 1 |
| a| c | 2 |
| b| b | 0 |
| b| c | 1 |
| c| c | 0 |
| c| b | 1 |
+---+---+---------+

谢谢。

最佳答案

你可以爆炸:

>>> from pyspark.sql.functions import explode
>>> results.select("id", explode("distances"))

关于python - 将 GraphFrames ShortestPath Map 转换为 PySpark 中的 DataFrame 行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37898313/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com