gpt4 book ai didi

apache-spark - 如何展平结构类型数组的列(由Spark ML API返回)?

转载 作者:行者123 更新时间:2023-12-04 04:08:48 25 4
gpt4 key购买 nike

也许仅仅是因为我对API不太熟悉,但是我觉得Spark ML方法经常返回不必要地难以使用的DF。

这次,是ALS模型让我大跌眼镜。具体来说,为recommendedForAllUsers方法。让我们重构它将返回的DF的类型:

scala> val arrayType = ArrayType(new StructType().add("itemId", IntegerType).add("rating", FloatType))

scala> val recs = Seq((1, Array((1, .7), (2, .5))), (2, Array((0, .9), (4, .1)))).
toDF("userId", "recommendations").
select($"userId", $"recommendations".cast(arrayType))

scala> recs.show()
+------+------------------+
|userId| recommendations|
+------+------------------+
| 1|[[1,0.7], [2,0.5]]|
| 2|[[0,0.9], [4,0.1]]|
+------+------------------+

scala> recs.printSchema
root
|-- userId: integer (nullable = false)
|-- recommendations: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- itemId: integer (nullable = true)
| | |-- rating: float (nullable = true)

现在,我只关心 itemId列中的 recommendations。毕竟,该方法是 recommendForAllUsers而不是 recommendAndScoreForAllUsers(好吧,我将不再变得野蛮...)

我该怎么做呢??

我以为在创建UDF时就拥有了它:

scala> val itemIds = udf((arr: Array[(Int, Float)]) => arr.map(_._1))

但这会产生一个错误:
scala> recs.withColumn("items", items($"recommendations"))
org.apache.spark.sql.AnalysisException: cannot resolve 'UDF(recommendations)' due to data type mismatch: argument 1 requires array<struct<_1:int,_2:float>> type, however, '`recommendations`' is of array<struct<itemId:int,rating:float>> type.;;
'Project [userId#87, recommendations#92, UDF(recommendations#92) AS items#238]
+- Project [userId#87, cast(recommendations#88 as array<struct<itemId:int,rating:float>>) AS recommendations#92]
+- Project [_1#84 AS userId#87, _2#85 AS recommendations#88]
+- LocalRelation [_1#84, _2#85]

有任何想法吗?谢谢!

最佳答案

哇,我的同事想出了一个非常优雅的解决方案:

scala> recs.select($"userId", $"recommendations.itemId").show
+------+------+
|userId|itemId|
+------+------+
| 1|[1, 2]|
| 2|[0, 4]|
+------+------+

所以毕竟Spark ML API并不是那么困难:)

关于apache-spark - 如何展平结构类型数组的列(由Spark ML API返回)?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46736063/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com