gpt4 book ai didi

apache-spark - PySpark Array 不是 Array

转载 作者:行者123 更新时间:2023-12-03 17:14:03 25 4
gpt4 key购买 nike

我正在运行一个非常简单的 Spark(Databricks 上的 2.4.0)ML 脚本:

from pyspark.ml.clustering import LDA

lda = LDA(k=10, maxIter=100).setFeaturesCol('features')
model = lda.fit(dataset)

但收到以下错误:
IllegalArgumentException: 'requirement failed: Column features must be of type equal to one of the following types: [struct<type:tinyint,size:int,indices:array<int>,values:array<double>>, array<double>, array<float>] but was actually of type array<double>.'

为什么我的 array<double>不是 array<double> ?

这是架构:
root
|-- BagOfWords: struct (nullable = true)
| |-- indices: array (nullable = true)
| | |-- element: long (containsNull = true)
| |-- size: long (nullable = true)
| |-- type: long (nullable = true)
| |-- values: array (nullable = true)
| | |-- element: double (containsNull = true)
|-- tokens: array (nullable = true)
| |-- element: string (containsNull = true)
|-- features: array (nullable = true)
| |-- element: double (containsNull = true)

最佳答案

您可能需要使用向量汇编程序将其转换为向量形式from pyspark.ml.feature import VectorAssembler

关于apache-spark - PySpark Array<double> 不是 Array<double>,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55639123/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com