gpt4 book ai didi

python - Spark : Broadcast variables: It appears that you are attempting to reference SparkContext from a broadcast variable, Action ,或转换

转载 作者:太空狗 更新时间:2023-10-29 21:26:16 26 4
gpt4 key购买 nike

Class ProdsTransformer:

def __init__(self):
self.products_lookup_hmap = {}
self.broadcast_products_lookup_map = None

def create_broadcast_variables(self):
self.broadcast_products_lookup_map = sc.broadcast(self.products_lookup_hmap)

def create_lookup_maps(self):
// The code here builds the hashmap that maps Prod_ID to another space.

pt = ProdsTransformer ()
pt.create_broadcast_variables()

pairs = distinct_users_projected.map(lambda x: (x.user_id,
pt.broadcast_products_lookup_map.value[x.Prod_ID]))

我收到以下错误:

"Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063."

任何有关如何处理广播变量的帮助都会很棒!

最佳答案

通过在 map lambda 中引用包含广播变量的对象,Spark 将尝试序列化整个对象并将其发送给工作人员。由于该对象包含对 SparkContext 的引用,因此您会收到错误。而不是这个:

pairs = distinct_users_projected.map(lambda x: (x.user_id, pt.broadcast_products_lookup_map.value[x.Prod_ID]))

试试这个:

bcast = pt.broadcast_products_lookup_map
pairs = distinct_users_projected.map(lambda x: (x.user_id, bcast.value[x.Prod_ID]))

后者避免了对对象的引用(pt),这样Spark只需要传送广播变量。

关于python - Spark : Broadcast variables: It appears that you are attempting to reference SparkContext from a broadcast variable, Action ,或转换,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31508689/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com