gpt4 book ai didi

PySpark.RDD.first -> UnpicklingError : NEWOBJ class argument has NULL tp_new

转载 作者:行者123 更新时间:2023-12-01 13:49:21 27 4
gpt4 key购买 nike

我将 python 2.7 与 spark 1.5.1 一起使用,我得到了这个:

df = sqlContext.read.parquet(".....").cache()
df = df.filter(df.foo == 1).select("a","b","c")
def myfun (row):
return pyspark.sql.Row(....)
rdd = df.map(myfun).cache()
rdd.first()
==> UnpicklingError: NEWOBJ class argument has NULL tp_new

怎么了?

最佳答案

像往常一样,pickling 错误归结为 myfun 被不可 picklable 对象关闭。

像往常一样,解决方案是使用mapPartitions:

import pygeoip
def get_geo (rows):
db = pygeoip.GeoIP("/usr/share/GeoIP/GeoIPCity.dat")
for row in rows:
d = row.asDict()
d["new"] = db.record_by_addr(row.client_ip) if row.client_ip else "noIP"
yield d
rdd.mapPartitions(get_geo)

代替 map :

import pygeoip
db = pygeoip.GeoIP("/usr/share/GeoIP/GeoIPCity.dat")
def get_geo (row):
d = row.asDict()
d["new"] = db.record_by_addr(row.client_ip) if row.client_ip else "noIP"
return d
rdd.map(get_geo)

关于PySpark.RDD.first -> UnpicklingError : NEWOBJ class argument has NULL tp_new,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33112441/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com