gpt4 book ai didi

python - 如何在pyspark中从单行添加多行和多列?

转载 作者:行者123 更新时间:2023-12-01 02:36:00 29 4
gpt4 key购买 nike

我是 Spark 新手,我有一个要求,需要从单行生成多行和多列。

输入:

col1   col2  col3  col4

输出

col1 col2   col3  col4 col5 col6 col7 

col1 col2 col3 col4 col8 col9 col10

Logics for new columns:

**col5 :**

if col1==0 and col3!=0:
col5 = col4/col3

else:
col5 = 0


**col6 :**

if col1==0 and col4!=0:
col6 = (col3*col4)/col1

else:
col6 = 0

For first row col7 holds same value as col2

**col8 :**

if col1!=0 and col3!=0:
col8 = col4/col3

else:
col8 = 0
**col9 :**

if col1!=0 and col4!=0:
col9 = (col3*col4)/col1

else:
col9 = 0

For second row col10 = col2+ "_NEW"

最后需要将“sum”函数与group by一起应用。希望我们转换上述结构后会很容易。

谷歌中的大部分文章解释了如何使用“withcolumn”选项而不是多列将单列添加到现有数据框。没有一篇文章解释过这种情况。因此我想请求您的帮助。

最佳答案

希望这有帮助!

from pyspark.sql.functions import col, when, lit, concat, round, sum

#sample data
df = sc.parallelize([(1, 2, 3, 4), (5, 6, 7, 8)]).toDF(["col1", "col2", "col3", "col4"])

#populate col5, col6, col7
col5 = when((col('col1') == 0) & (col('col3') != 0), round(col('col4')/ col('col3'), 2)).otherwise(0)
col6 = when((col('col1') == 0) & (col('col4') != 0), round((col('col3') * col('col4'))/ col('col1'), 2)).otherwise(0)
col7 = col('col2')
df1 = df.withColumn("col5", col5).\
withColumn("col6", col6).\
withColumn("col7", col7)

#populate col8, col9, col10
col8 = when((col('col1') != 0) & (col('col3') != 0), round(col('col4')/ col('col3'), 2)).otherwise(0)
col9 = when((col('col1') != 0) & (col('col4') != 0), round((col('col3') * col('col4'))/ col('col1'), 2)).otherwise(0)
col10= concat(col('col2'), lit("_NEW"))
df2 = df.withColumn("col5", col8).\
withColumn("col6", col9).\
withColumn("col7", col10)

#final dataframe
final_df = df1.union(df2)
final_df.show()

#groupBy calculation
#final_df.groupBy("col1", "col2", "col3", "col4").agg(sum("col5")).show()

输出是:

+----+----+----+----+----+----+-----+
|col1|col2|col3|col4|col5|col6| col7|
+----+----+----+----+----+----+-----+
| 1| 2| 3| 4| 0.0| 0.0| 2|
| 5| 6| 7| 8| 0.0| 0.0| 6|
| 1| 2| 3| 4|1.33|12.0|2_NEW|
| 5| 6| 7| 8|1.14|11.2|6_NEW|
+----+----+----+----+----+----+-----+


如果它解决了您的问题,请不要忘记告诉我们:)

关于python - 如何在pyspark中从单行添加多行和多列?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46222077/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com