gpt4 book ai didi

python - 调用返回 FloatType() 的 UDF 时为 "expected zero arguments for construction of ClassDict (for numpy.dtype)"

转载 作者:太空宇宙 更新时间:2023-11-03 13:57:18 26 4
gpt4 key购买 nike

我相信它与这个有关:Spark Error:expected zero arguments for construction of ClassDict (for numpy.core.multiarray._reconstruct)

我有一个数据框

id col_1 col_2
1 [1,2] [1,3]
2 [2,1] [3,4]

我想创建另一列,它是 col_1col_2 之间的 cosine 距离。

from scipy.spatial.distance import cosine

def cosine_distance(a,b):
try:
return cosine(a, b)
except Exception as e:
return 0.0 # in case division by zero

然后我定义了一个udf:

cosine_distance_udf = udf (cosine_distance, FloatType())

最后:

new_df = df.withColumn('cosine_distance', cosine_distance_udf('col_1', 'col_2'))

我有错误:PickleException: expected zero arguments for construction of ClassDict (for numpy.dtype)

我做错了什么?

最佳答案

检查cosine的返回类型,错误原因就很清楚了:

type(cosine([1, 2], [1, 3]))
# numpy.float64

但是,np.float64float 的子类:

issubclass(np.float64, float)
# True

所以,对你的函数做一个小改动,

def cosine_distance(a, b):
try:
return float(cosine(a, b)) # cosine(a, b).item()
except Exception as e:
return 0.0 # in case division by zero

这会起作用

df.withColumn('cosine_distance', cosine_distance_udf('col_1', 'col_2')).show()

+------+------+---------------+
| col_1| col_2|cosine_distance|
+------+------+---------------+
|[1, 2]|[3, 4]| 0.01613009|
|[2, 1]|[3, 4]| 0.10557281|
+------+------+---------------+

关于python - 调用返回 FloatType() 的 UDF 时为 "expected zero arguments for construction of ClassDict (for numpy.dtype)",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53800062/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com