gpt4 book ai didi

scala - StructField(a,StringType,false) 中的错误。这是假的,应该是真的

转载 作者:行者123 更新时间:2023-12-04 22:57:29 27 4
gpt4 key购买 nike

我在 Scala 测试中有这个错误:

StructType(StructField(a,StringType,true), StructField(b,StringType,true), StructField(c,StringType,true), StructField(d,StringType,true), StructField(e,StringType,true), StructField(f,StringType,true), StructField(NewColumn,StringType,false)) did not equal StructType(StructField(a,StringType,true), StructField(b,StringType,true), StructField(c,StringType,true), StructField(d,StringType,true), StructField(e,StringType,true), StructField(f,StringType,true), StructField(NewColumn,StringType,true))

ScalaTestFailureLocation: com.holdenkarau.spark.testing.TestSuite$class at (TestSuite.scala:13)

Expected :StructType(StructField(a,StringType,true), StructField(b,StringType,true), StructField(c,StringType,true), StructField(d,StringType,true), StructField(e,StringType,true), StructField(f,StringType,true), StructField(NewColumn,StringType,true))

Actual :StructType(StructField(a,StringType,true), StructField(b,StringType,true), StructField(c,StringType,true), StructField(d,StringType,true), StructField(e,StringType,true), StructField(f,StringType,true), StructField(NewColumn,StringType,false))

最后 StructFieldfalse什么时候应该 true我不知道为什么。这真的意味着架构接受空值。

这是我的测试:
val schema1 = Array("a", "b", "c", "d", "e", "f")
val df = List(("a1", "b1", "c1", "d1", "e1", "f1"),
("a2", "b2", "c2", "d2", "e2", "f2"))
.toDF(schema1: _*)

val schema2 = Array("a", "b", "c", "d", "e", "f", "NewColumn")

val dfExpected = List(("a1", "b1", "c1", "d1", "e1", "f1", "a1_b1_c1_d1_e1_f1"),
("a2", "b2", "c2", "d2", "e2", "f2", "a2_b2_c2_d2_e2_f2")).toDF(schema2: _*)

val transformer = KeyContract("NewColumn", schema1)
val newDf = transformer(df)
newDf.columns should contain ("NewColumn")
assertDataFrameEquals(newDf, dfExpected)

这是 KeyContract:
case class KeyContract(tempColumn: String, columns: Seq[String],
unsigned: Boolean = true) extends Transformer {

override def apply(input: DataFrame): DataFrame = {
import org.apache.spark.sql.functions._

val inputModif = columns.foldLeft(input) { (tmpDf, columnName) =>
tmpDf.withColumn(columnName, when(col(columnName).isNull,
lit("")).otherwise(col(columnName)))
}

inputModif.withColumn(tempColumn, concat_ws("_", columns.map(col): _*))
}
}

提前致谢!!

最佳答案

这是因为 concat_ws永不返回 null并且结果字段被标记为不可为空。

如果您想使用第二个 DataFrame作为引用,您必须使用架构和 Rows :

import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types._

val spark: SparkSession = SparkSession.builder.getOrCreate()

val dfExpected = spark.createDataFrame(spark.sparkContext.parallelize(List(
Row("a1", "b1", "c1", "d1", "e1", "f1", "a1_b1_c1_d1_e1_f1"),
Row("a2", "b2", "c2", "d2", "e2", "f2", "a2_b2_c2_d2_e2_f2")
)), StructType(schema2.map { c => StructField(c, StringType, c != "NewColumn") }))

这样最后一列就不能为空:
dfExpected.printSchema
root
|-- a: string (nullable = true)
|-- b: string (nullable = true)
|-- c: string (nullable = true)
|-- d: string (nullable = true)
|-- e: string (nullable = true)
|-- f: string (nullable = true)
|-- NewColumn: string (nullable = false)

关于scala - StructField(a,StringType,false) 中的错误。这是假的,应该是真的,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49758380/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com