gpt4 book ai didi

scala - 如何获取所有值为空的列名?

转载 作者:行者123 更新时间:2023-12-02 06:49:36 25 4
gpt4 key购买 nike

当它具有空值时,我没有任何想法来获取列名

例如,

case class A(name: String, id: String, email: String, company: String)

val e1 = A("n1", null, "n1@c1.com", null)
val e2 = A("n2", null, "n2@c1.com", null)
val e3 = A("n3", null, "n3@c1.com", null)
val e4 = A("n4", null, "n4@c2.com", null)
val e5 = A("n5", null, "n5@c2.com", null)
val e6 = A("n6", null, "n6@c2.com", null)
val e7 = A("n7", null, "n7@c3.com", null)
val e8 = A("n8", null, "n8@c3.com", null)
val As = Seq(e1, e2, e3, e4, e5, e6, e7, e8)
val df = sc.parallelize(As).toDF

此代码使数据框如下所示:
+----+----+---------+-------+
|name| id| email|company|
+----+----+---------+-------+
| n1|null|n1@c1.com| null|
| n2|null|n2@c1.com| null|
| n3|null|n3@c1.com| null|
| n4|null|n4@c2.com| null|
| n5|null|n5@c2.com| null|
| n6|null|n6@c2.com| null|
| n7|null|n7@c3.com| null|
| n8|null|n8@c3.com| null|
+----+----+---------+-------+

我想获取所有行的列名均为null:id,company

我不在乎输出的类型。数组,字符串,RDD随便

最佳答案

您可以对所有列进行简单计数,然后使用返回0计数的列索引来对df.columns进行子集设置:

import org.apache.spark.sql.functions.{count,col}
// Get column indices
val col_inds = df.select(df.columns.map(c => count(col(c)).alias(c)): _*)
.collect()(0)
.toSeq.zipWithIndex
.filter(_._1 == 0).map(_._2)
// Subset column names using the indices
col_inds.map(i => df.columns.apply(i))
//Seq[String] = ArrayBuffer(id, company)

关于scala - 如何获取所有值为空的列名?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48110448/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com