gpt4 book ai didi

scala - 如何将下表转换为所需格式?

转载 作者:行者123 更新时间:2023-12-04 08:27:45 24 4
gpt4 key购买 nike

我将下表加载为数据框:

Id  Name   customCount    Custom1    Custom1value  custom2    custom2Value   custom3    custom3Value
1 qwerty 2 Height 171 Age 76 Null Null
2 asdfg 2 Weight 78 Height 166 Null Null
3 zxcvb 3 Age 28 SkinColor white Height 67
4 tyuio 1 Height 177 Null Null Null Null
5 asdfgh 2 SkinColor brown Age 34 Null Null
我需要将此表更改为以下格式:
Id  Name    customCount Height     Weight   Age   SkinColor
1 qwerty 2 171 Null 76 Null
2 asdfg 2 161 78 Null Null
3 zxcvb 3 67 Null 28 white
4 tyuio 1 177 Null Null Null
5 asdfgh 2 Null Null 34 brown
我尝试了两个自定义字段列:
val rawDf= spark.read.option("Header",false).options(Map("sep"->"|")).csv("/sample/data.csv")
rawDf.createOrReplaceTempView("Table")
val dataframe=spark.sql("select distinct * from (select `_c3` from Table union select `_c5` from Table)")
val dfWithDistinctColumns=dataframe.toDF("colNames")
val list=dfWithDistinctColumns.select("colNames").map(x=>x.getString(0)).collect().toList
val rawDfWithSchema=rawDf.toDF("Id","Name",customCount","h1","v1","h2","v2")
val expectedDf=list.foldLeft(rawDfWithSchema)((df1,c)=>(df1.withColumn(c, when(col("h1")===c,col("v1")).when(col("h2")===c,col("v2")).otherwise(null)))).drop("h1","h2","v1","v2")
但是当我在 3 个自定义字段上尝试时,我无法在多个列上进行联合。
你能为此提供任何想法/解决方案吗?

最佳答案

你可以做一个pivot,但你还需要先清理dataframe的格式:

val df2 = df.select(
$"Id", $"Name", $"customCount",
explode(array(
array($"Custom1", $"Custom1value"),
array($"custom2", $"custom2Value"),
array($"custom3", $"custom3Value")
)).alias("custom")
).select(
$"Id", $"Name", $"customCount",
$"custom"(0).alias("key"),
$"custom"(1).alias("value")
).groupBy(
"Id", "Name", "customCount"
).pivot("key").agg(first("value")).drop("null").orderBy("Id")

df2.show
+---+------+-----------+----+------+---------+------+
| Id| Name|customCount| Age|Height|SkinColor|Weight|
+---+------+-----------+----+------+---------+------+
| 1|qwerty| 2| 76| 171| null| null|
| 2| asdfg| 2|null| 166| null| 78|
| 3| zxcvb| 3| 28| 67| white| null|
| 4| tyuio| 1|null| 177| null| null|
| 5|asdfgh| 2| 34| null| brown| null|
+---+------+-----------+----+------+---------+------+

关于scala - 如何将下表转换为所需格式?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65183277/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com