python - 将 GraphFrames ShortestPath Map 转换为 PySpark 中的 DataFrame 行-6ren

python - 将 GraphFrames ShortestPath Map 转换为 PySpark 中的 DataFrame 行

转载作者：太空狗更新时间：2023-10-30 02:41:58

24

4

我正在尝试找到最有效的方法来从 GraphFrames 函数 shortestPaths 获取 Map 输出并将每个顶点的距离映射展平到新 DataFrame 中的单独行中。通过将距离列拉入字典，然后从那里转换为 Pandas 数据帧，然后再转换回 Spark 数据帧，我已经能够非常笨拙地做到这一点，但我知道必须有更好的方法。

from graphframes import *

v = sqlContext.createDataFrame([
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30),
], ["id", "name", "age"])

# Create an Edge DataFrame with "src" and "dst" columns
e = sqlContext.createDataFrame([
  ("a", "b", "friend"),
  ("b", "c", "follow"),
  ("c", "b", "follow"),
], ["src", "dst", "relationship"])

# Create a GraphFrame
g = GraphFrame(v, e)

results = g.shortestPaths(landmarks=["a", "b","c"])
results.select("id","distances").show()

+---+--------------------+
| id|           distances|
+---+--------------------+
|  a|Map(a -> 0, b -> ...|
|  b| Map(b -> 0, c -> 1)|
|  c| Map(c -> 0, b -> 1)|
+---+--------------------+

我想要的是采用上面的输出并拉平距离，同时将 id 保持为如下所示:

+---+---+---------+      
| id| v | distance|
+---+---+---------+
|  a| a | 0       |
|  a| b | 1       |
|  a| c | 2       |
|  b| b | 0       |
|  b| c | 1       |
|  c| c | 0       |
|  c| b | 1       |
+---+---+---------+

谢谢。

最佳答案

你可以爆炸:

>>> from pyspark.sql.functions import explode
>>> results.select("id", explode("distances"))

关于python - 将 GraphFrames ShortestPath Map 转换为 PySpark 中的 DataFrame 行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37898313/

24

4

0

文章推荐： python - Scrapy process.crawl() 导出数据到json

文章推荐： Python绘制概率分布的百分等高线

文章推荐： python - 如何即时制作动态 stub 函数？

文章推荐： python - 在 ImageGrid 中使用 basemap.pcolor 清空颜色栏

OrientDB 使用 shortestPath() 获取边
我有一个关于 OrientDB 的 shortestPath() 函数的问题。如果我查询 select shortestPath('#9:1', '#15:1', 'BOTH')针对 OrientDB
r - 从 shortestPath iGraph 对象收集集合中的边
我正在尝试从最短路径 iGraph 函数创建的对象中收集所有唯一边。 > data data Q W E R T Y U I Q 0 4 7 5 0 4 0 0 W 2 0 5 7 3 2 4 9
java - OrientDB 的 ShortestPath 查询太慢
我使用以下代码来查找两个节点之间的最短路径: Iterable spath = orientGraph.getRawGraph().command(new OSQLSynchQuery(
java - OrientDB shortestPath() 使用特定的边@class？
shortestPath() 函数将方向作为第三个参数。我如何修改该函数，使其只遍历特定类的边缘？我相信我可以将类添加为第 106 行的第二个参数: https://github.com/orient
java - 在 OrientDB 的 shortestPath() 中获取访问过的边
我是 OrientDB 的新手，我想使用新的 shortestPath() 方法来获取两个顶点之间的边。我做的是: OSQLSynchQuery sql = new OSQLSynchQuery("
Neo4J 密码 : Why is assigning nodes to variables required in the shortestPath function
当我尝试在 Person 节点和 Movie 节点之间执行 shortestPath() 函数时，例如 - MATCH p=shortestPath((:Person)-[*1..4]->(:Movi
neo4j - Spring Data Neo4j 中的@Query shortestPath 返回类型
以下查询的返回类型是什么？我该如何使用它？我尝试了几种方法，例如 Path , Iterable ，和其他人，但我总是遇到某种异常(exception)。好像是LinkedHashMap但是我可以使用
python - 将 GraphFrames ShortestPath Map 转换为 PySpark 中的 DataFrame 行
我正在尝试找到最有效的方法来从 GraphFrames 函数 shortestPaths 获取 Map 输出并将每个顶点的距离映射展平到新 DataFrame 中的单独行中。通过将距离列拉入字典，然后

首页

博学

6Ren·AI

商城

python - 将 GraphFrames ShortestPath Map 转换为 PySpark 中的 DataFrame 行