gpt4 book ai didi

pyspark - 获取数组 Pyspark 中的第一个元素

转载 作者:行者123 更新时间:2023-12-04 17:35:55 34 4
gpt4 key购买 nike

我想添加新的 2 列值服务 arr 第一个和第二个值
但我收到错误:

Field name should be String Literal, but it's 0;


production_target_datasource_df.withColumn("newcol",production_target_datasource_df["Services"].getItem(0))
    +------------------+--------------------+
| cid | Services|
+------------------+--------------------+
|845124826013182686| [112931, serv1]|
|845124826013182686| [146936, serv1]|
|845124826013182686| [32718, serv2]|
|845124826013182686| [28839, serv2]|
|845124826013182686| [8710, serv2]|
|845124826013182686| [2093140, serv3]|

最佳答案

您不必使用 .getItem(0)production_target_datasource_df["Services"][0]就足够了。

# Constructing your table:
from pyspark.sql import Row

df = sc.parallelize([Row(cid=1,Services=["2", "serv1"]),
Row(cid=1, Services=["3", "serv1"]),
Row(cid=1, Services=["4", "serv2"])]).toDF()
df.show()
+---+----------+
|cid| Services|
+---+----------+
| 1|[2, serv1]|
| 1|[3, serv1]|
| 1|[4, serv2]|
+---+----------+

# Adding the two columns:
new_df = df.withColumn("first_element", df.Services[0])
new_df = new_df.withColumn("second_element", df.Services[1])
new_df.show()

+---+----------+-------------+--------------+
|cid| Services|first_element|second_element|
+---+----------+-------------+--------------+
| 1|[2, serv1]| 2| serv1|
| 1|[3, serv1]| 3| serv1|
| 1|[4, serv2]| 4| serv2|
+---+----------+-------------+--------------+

关于pyspark - 获取数组 Pyspark 中的第一个元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56582056/

34 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com