gpt4 book ai didi

apache-spark - pyspark 相当于将常量数组作为列添加到数据框中

转载 作者:行者123 更新时间:2023-12-05 09:11:54 27 4
gpt4 key购买 nike

下面的代码适用于 scala-spark

scala> val ar = Array("oracle","java")
ar: Array[String] = Array(oracle, java)

scala> df.withColumn("tags",lit(ar)).show(false)
+------+---+----------+----------+--------------+
|name |age|role |experience|tags |
+------+---+----------+----------+--------------+
|John |25 |Developer |2.56 |[oracle, java]|
|Scott |30 |Tester |5.2 |[oracle, java]|
|Jim |28 |DBA |3.0 |[oracle, java]|
|Mike |35 |Consultant|10.0 |[oracle, java]|
|Daniel|26 |Developer |3.2 |[oracle, java]|
|Paul |29 |Tester |3.6 |[oracle, java]|
|Peter |30 |Developer |6.5 |[oracle, java]|
+------+---+----------+----------+--------------+


scala>

如何在 pyspark 中获得相同的行为?我在下面尝试过,但它不起作用并抛出 Java 错误

>>> from pyspark.sql.types import *

>>> tag=["oracle","java"]
>>> df2.withColumn("tags",lit(tag)).show()

错误

: java.lang.RuntimeException: Unsupported literal type class java.util.ArrayList [oracle, java]

最佳答案

你可以从函数模块导入数组

>>> from pyspark.sql.types import *
>>> from pyspark.sql.functions import array

>>> tag=array(lit("oracle"),lit("java")
>>> df2.withColumn("tags",tag).show()

下面测试

>>> from pyspark.sql.functions import array

>>> tag=array(lit("oracle"),lit("java"))
>>>
>>> ranked.withColumn("tag",tag).show()
+------+--------------+----------+-----+----+----+--------------+
|gender| ethinicity|first_name|count|rank|year| tag|
+------+--------------+----------+-----+----+----+--------------+
| MALE| HISPANIC| JAYDEN| 364| 1|2012|[oracle, java]|
| MALE|WHITE NON HISP| JOSEPH| 300| 2|2012|[oracle, java]|
| MALE|WHITE NON HISP| JOSEPH| 300| 2|2012|[oracle, java]|
| MALE| HISPANIC| JACOB| 293| 4|2012|[oracle, java]|
| MALE| HISPANIC| JACOB| 293| 4|2012|[oracle, java]|
| MALE|WHITE NON HISP| DAVID| 289| 6|2012|[oracle, java]|
| MALE|WHITE NON HISP| DAVID| 289| 6|2012|[oracle, java]|
| MALE| HISPANIC| MATTHEW| 279| 8|2012|[oracle, java]|
| MALE| HISPANIC| MATTHEW| 279| 8|2012|[oracle, java]|
| MALE| HISPANIC| ETHAN| 254| 10|2012|[oracle, java]|
| MALE| HISPANIC| ETHAN| 254| 10|2012|[oracle, java]|
| MALE|WHITE NON HISP| MICHAEL| 245| 12|2012|[oracle, java]|
| MALE|WHITE NON HISP| MICHAEL| 245| 12|2012|[oracle, java]|
| MALE|WHITE NON HISP| JACOB| 242| 14|2012|[oracle, java]|
| MALE|WHITE NON HISP| JACOB| 242| 14|2012|[oracle, java]|
| MALE|WHITE NON HISP| MOSHE| 238| 16|2012|[oracle, java]|
| MALE|WHITE NON HISP| MOSHE| 238| 16|2012|[oracle, java]|
| MALE| HISPANIC| ANGEL| 236| 18|2012|[oracle, java]|
| MALE| HISPANIC| AIDEN| 235| 19|2012|[oracle, java]|
| MALE|WHITE NON HISP| DANIEL| 232| 20|2012|[oracle, java]|
+------+--------------+----------+-----+----+----+--------------+
only showing top 20 rows

关于apache-spark - pyspark 相当于将常量数组作为列添加到数据框中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59532087/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com