gpt4 book ai didi

scala - 使用嵌套字段更新数据框 - Spark

转载 作者:可可西里 更新时间:2023-11-01 14:44:18 24 4
gpt4 key购买 nike

<分区>

我有如下两个数据框

Df1

    +----------------------+---------+
|products |visitorId|
+----------------------+---------+
|[[i1,0.68], [i2,0.42]]|v1 |
|[[i1,0.78], [i3,0.11]]|v2 |
+----------------------+---------+

Df2

+---+----------+
| id| name|
+---+----------+
| i1|Nike Shoes|
| i2| Umbrella|
| i3| Jeans|
+---+----------+

这是数据框 Df1 的架构

root
|-- products: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- id: string (nullable = true)
| | |-- interest: double (nullable = true)
|-- visitorId: string (nullable = true)

我想加入 2 个数据帧以便输出为

+------------------------------------------+---------+
|products |visitorId|
+------------------------------------------+---------+
|[[i1,0.68,Nike Shoes], [i2,0.42,Umbrella]]|v1 |
|[[i1,0.78,Nike Shoes], [i3,0.11,Jeans]] |v2 |
+------------------------------------------+---------+

这是我期望的输出模式

root
|-- products: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- id: string (nullable = true)
| | |-- interest: double (nullable = true)
| | |-- name: double (nullable = true)
|-- visitorId: string (nullable = true)

我如何在 Scala 中做到这一点?我正在使用 Spark 2.2.0。

更新

我对上面的数据帧进行了分解和连接,得到了下面的输出。

+---------+---+--------+----------+
|visitorId| id|interest| name|
+---------+---+--------+----------+
| v1| i1| 0.68|Nike Shoes|
| v1| i2| 0.42| Umbrella|
| v2| i1| 0.78|Nike Shoes|
| v2| i3| 0.11| Jeans|
+---------+---+--------+----------+

现在,我只需要以下 json 格式的上述数据框。

{
"visitorId": "v1",
"products": [{
"id": "i1",
"name": "Nike Shoes",
"interest": 0.68
}, {
"id": "i2",
"name": "Umbrella",
"interest": 0.42
}]
},
{
"visitorId": "v2",
"products": [{
"id": "i1",
"name": "Nike Shoes",
"interest": 0.78
}, {
"id": "i3",
"name": "Jeans",
"interest": 0.11
}]
}

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com