gpt4 book ai didi

apache-spark - Pyspark 2.0 - IndextoString 错误

转载 作者:行者123 更新时间:2023-12-04 12:46:42 25 4
gpt4 key购买 nike

我正在使用 Pyspark 处理 Spark 2.0 来解决分类问题。我正在尝试获取分类算法预测的原始名称。但我没有这样做。

代码:

predictions = dtModel.transform(self._pred)
converter = IndexToString(inputCol="prediction", outputCol="role")
converted = converter.transform(predictions)

错误:

  File "/hba03/yarn/nm/usercache/sbeathanabhotla/appcache/application_1498495374459_2397452/container_1498495374459_2397452_01_000001/build.zip/src/com/ci/buyerroletagging/service/ModelBuilder.py", line 45, in decision_tree_classifier
converted = converter.transform(predictions.select('prediction'))
File "/vol1/cloudera/parcels/SPARK2-2.0.0.cloudera1-1.cdh5.7.0.p0.113931/lib/spark2/python/lib/pyspark.zip/pyspark/ml/base.py", line 105, in transform
return self._transform(dataset)
File "/vol1/cloudera/parcels/SPARK2-2.0.0.cloudera1-1.cdh5.7.0.p0.113931/lib/spark2/python/lib/pyspark.zip/pyspark/ml/wrapper.py", line 252, in _transform
return DataFrame(self._java_obj.transform(dataset._jdf), dataset.sql_ctx)
File "/vol1/cloudera/parcels/SPARK2-2.0.0.cloudera1-1.cdh5.7.0.p0.113931/lib/spark2/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1133, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/vol1/cloudera/parcels/SPARK2-2.0.0.cloudera1-1.cdh5.7.0.p0.113931/lib/spark2/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/vol1/cloudera/parcels/SPARK2-2.0.0.cloudera1-1.cdh5.7.0.p0.113931/lib/spark2/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value
format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o545.transform.
: java.lang.ClassCastException: org.apache.spark.ml.attribute.UnresolvedAttribute$ cannot be cast to org.apache.spark.ml.attribute.NominalAttribute
at org.apache.spark.ml.feature.IndexToString.transform(StringIndexer.scala:313)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)

预测:

+--------------------+--------------------+--------------------+--------------------+----------+
| user_guid| features| rawPrediction| probability|prediction|
+--------------------+--------------------+--------------------+--------------------+----------+
|9c0393cd-67e1-425...|(239,[1,89,125,21...|[0.0,44.0,0.0,0.0...|[0.0,1.0,0.0,0.0,...| 1.0|
|fdbaccb8-5946-472...|(239,[0,78,124,18...|[96.0,0.0,0.0,0.0...|[1.0,0.0,0.0,0.0,...| 0.0|
|fdbaccb8-5946-472...|(239,[0,78,130,18...|[96.0,0.0,0.0,0.0...|[1.0,0.0,0.0,0.0,...| 0.0|
|883bca4e-9a74-4dd...|(239,[1,28,123,13...|[0.0,44.0,0.0,0.0...|[0.0,1.0,0.0,0.0,...| 1.0|
|883bca4e-9a74-4dd...|(239,[1,28,123,13...|[0.0,0.0,42.0,0.0...|[0.0,0.0,1.0,0.0,...| 2.0|
|883bca4e-9a74-4dd...|(239,[1,28,124,13...|[0.0,0.0,0.0,21.0...|[0.0,0.0,0.0,0.28...| 3.0|
|883bca4e-9a74-4dd...|(239,[1,28,123,13...|[0.0,0.0,0.0,21.0...|[0.0,0.0,0.0,0.28...| 3.0|
|883bca4e-9a74-4dd...|(239,[1,28,128,13...|[0.0,0.0,0.0,21.0...|[0.0,0.0,0.0,0.28...| 3.0|
|883bca4e-9a74-4dd...|(239,[1,28,123,13...|[0.0,0.0,0.0,21.0...|[0.0,0.0,0.0,0.28...| 3.0|
|883bca4e-9a74-4dd...|(239,[1,28,128,13...|[0.0,0.0,42.0,0.0...|[0.0,0.0,1.0,0.0,...| 2.0|
|883bca4e-9a74-4dd...|(239,[1,28,124,13...|[0.0,44.0,0.0,0.0...|[0.0,1.0,0.0,0.0,...| 1.0|
|883bca4e-9a74-4dd...|(239,[1,28,124,13...|[0.0,0.0,0.0,21.0...|[0.0,0.0,0.0,0.28...| 3.0|
|883bca4e-9a74-4dd...|(239,[1,28,128,13...|[0.0,44.0,0.0,0.0...|[0.0,1.0,0.0,0.0,...| 1.0|
|883bca4e-9a74-4dd...|(239,[1,28,124,13...|[0.0,0.0,42.0,0.0...|[0.0,0.0,1.0,0.0,...| 2.0|
|883bca4e-9a74-4dd...|(239,[1,28,128,13...|[0.0,0.0,0.0,21.0...|[0.0,0.0,0.0,0.28...| 3.0|
|58b6246a-7f2a-40b...|(239,[0,64,124,19...|[0.0,0.0,0.0,21.0...|[0.0,0.0,0.0,0.28...| 3.0|
|58b6246a-7f2a-40b...|(239,[0,64,124,19...|[96.0,0.0,0.0,0.0...|[1.0,0.0,0.0,0.0,...| 0.0|
|d05b08ab-eef0-496...|(239,[10,80,124,1...|[96.0,0.0,0.0,0.0...|[1.0,0.0,0.0,0.0,...| 0.0|
|d05b08ab-eef0-496...|(239,[10,80,124,1...|[0.0,0.0,0.0,21.0...|[0.0,0.0,0.0,0.28...| 3.0|
|b35a734a-98ba-4e3...|(239,[0,30,129,22...|[96.0,0.0,0.0,0.0...|[1.0,0.0,0.0,0.0,...| 0.0|
+--------------------+--------------------+--------------------+--------------------+----------+
only showing top 20 rows

我在这里遗漏了什么吗?

最佳答案

嗯,这是完全神秘的原因,但您需要提供 labels 参数(尽管文档中的示例似乎没有它也能正常工作)。这是一个带有我自己的 predictions 和 2 个类的玩具示例:

predictions.show()
# +-----+-----------+-------------+-----------+----------+
# |label| features|rawPrediction|probability|prediction|
# +-----+-----------+-------------+-----------+----------+
# | 0|[140.0,0.0]| [2.0,0.0]| [1.0,0.0]| 0.0|
# | 0|[150.0,0.0]| [2.0,0.0]| [1.0,0.0]| 0.0|
# | 1|[160.0,1.0]| [0.0,2.0]| [0.0,1.0]| 1.0|
# | 1|[170.0,1.0]| [0.0,2.0]| [0.0,1.0]| 1.0|
# +-----+-----------+-------------+-----------+----------+

converter = IndexToString(inputCol="prediction", outputCol="role", labels=['a', 'b'])
converted = converter.transform(predictions)
converted.show()
# +-----+-----------+-------------+-----------+----------+----+
# |label| features|rawPrediction|probability|prediction|role|
# +-----+-----------+-------------+-----------+----------+----+
# | 0|[140.0,0.0]| [2.0,0.0]| [1.0,0.0]| 0.0| a|
# | 0|[150.0,0.0]| [2.0,0.0]| [1.0,0.0]| 0.0| a|
# | 1|[160.0,1.0]| [0.0,2.0]| [0.0,1.0]| 1.0| b|
# | 1|[170.0,1.0]| [0.0,2.0]| [0.0,1.0]| 1.0| b|
# +-----+-----------+-------------+-----------+----------+----+

如果我省略 labels 参数,我会得到与您相同的错误。因此,如果您自己的标签从示例中的 0.0 变为 3.0,您将需要像 labels=['a', 'b', 'c', 'd'] 这样的东西 - 通常, labels 必须是与标签数量相同长度的列表。

关于apache-spark - Pyspark 2.0 - IndextoString 错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45463618/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com