gpt4 book ai didi

python - Pyspark UDF获取openCV问题描述符的问题

转载 作者:太空宇宙 更新时间:2023-11-03 19:45:02 24 4
gpt4 key购买 nike

我从 Spark 理念开始,就我而言,是 Pyspark。

我有一个小型学校项目要做,看起来并不难,但是我已经做了很多天了,但仍然无法成功。

我必须将图像加载到文件夹中并提取描述符以进行降维。

我使用图像路径创建了一个 Pyspark 数据框,现在我想添加一个包含描述符的列。

我是这样做的。

图像路径列表:

    lst_path = []

sub_folders = os.listdir(folder)

print(sub_folders)
for f in sub_folders[:1]:

lst_categ = os.listdir(folder + f)

for file in lst_categ:

lst_path.append(folder + f + "/" + file)

print("Nombre d'images chargées :", len(lst_path))

rdd = sc.parallelize(lst_path)
row_rdd = rdd.map(lambda x: Row(x))
df = spark.createDataFrame(row_rdd, ["path_img"])

提取描述符的函数:

def get_desc(img):

img = cv2.imread(file)
orb = cv2.ORB_create(nfeatures=50)
keypoints_orb, desc = orb.detectAndCompute(img, None)

desc = desc.flatten()

return desc

函数UDF:

udf_image = udf(lambda img: get_desc(img), ArrayType(FloatType()))

创建新列:

df2 = df.withColumn("img_vectorized", udf_image("path_img"))

printSchema() 的结果:

root
|-- path_img: string (nullable = true)
|-- img_vectorized:array (nullable = true)
| |-- element: float (containsNull = true)

当我执行 df2.show() 时,我收到以下错误消息:

Py4JJavaError: An error occurred while calling o773.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 18.0 failed 1 times, most recent failure: Lost task 0.0 in stage 18.0 (TID 93, localhost, executor driver): net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict (for numpy.core.multiarray._reconstruct)

AttributeError: 'NoneType' object has no attribute 'flatten'

我注意到描述符为空。我指定,当我在一行上执行此提取时,它会起作用。

我不明白为什么它不适用于我的数据框。你能帮我一下吗?

谢谢。

最佳答案

经过多天的研究,我昨晚找到了解决方案......

我更正后的代码:

def get_desc(img):

image = cv2.imread(img)
orb = cv2.ORB_create(nfeatures=50)
keypoints_orb, desc = orb.detectAndCompute(image, None)

if desc is None:

desc = 0
else:
desc = desc.flatten().tolist()

return desc

udf_image = udf(get_desc, ArrayType(IntegerType()))

df_desc = df.withColumn("descriptors", udf_image("path_img"))

df_desc = df_desc.filter(df_desc.descriptors. isNotNull())

df_desc.show()
+--------------------+--------------------+ 
| path_img| descriptors|
+--------------------+--------------------+
|Training/Apple-Br...|[69, 113, 253, 10...|
|Training/Apple-Br...|[212, 236, 159, 2...|
|Training/Apple-Br...|[60, 53, 123, 239...|
|Training/Apple-Br...|[255, 189, 252, 1...|
|Training/Apple-Br...|[204, 244, 149, 1...|
+--------------------+--------------------+

关于python - Pyspark UDF获取openCV问题描述符的问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60192589/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com