gpt4 book ai didi

scala - 读取 parquet 文件时无法解析具有 int 和 double 的合并架构

转载 作者:行者123 更新时间:2023-12-03 07:14:15 25 4
gpt4 key购买 nike

我有两个 parquet 文件,一个包含整数字段 myField,另一个包含双字段 myField。尝试同时读取两个文件时

val basePath = "/path/to/file/"
val fileWithInt = basePath + "intFile.snappy.parquet"
val fileWithDouble = basePath + "doubleFile.snappy.parquet"
val result = spark.sqlContext.read.option("mergeSchema", true).option("basePath", basePath).parquet(Seq(fileWithInt, fileWithDouble): _*).select("myField")

我收到以下错误

Caused by: org.apache.spark.SparkException: Failed to merge fields 'myField' and 'myField'. Failed to merge incompatible data types IntegerType and DoubleType

传递显式架构时

val schema = StructType(Seq(new StructField("myField", IntegerType)))
val result = spark.sqlContext.read.schema(schema).option("mergeSchema", true).option("basePath", basePath).parquet(Seq(fileWithInt, fileWithDouble): _*).select("myField")

失败并显示以下内容

java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainDoubleDictionary
at org.apache.parquet.column.Dictionary.decodeToInt(Dictionary.java:48)

当施放至双倍时

val schema = StructType(Seq(new StructField("myField", DoubleType)))

我明白了

java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
at org.apache.parquet.column.Dictionary.decodeToDouble(Dictionary.java:60)

除了重新处理源数据之外,有人知道解决此问题的任何方法吗?

最佳答案

根据您要读取的文件数量,您可以使用以下两种方法之一:

这最适合数量较少的 Parquet 文件

def merge(spark: SparkSession, paths: Seq[String]): DataFrame = {
import spark.implicits._

paths.par.map {
path =>
spark.read.parquet(path).withColumn("myField", $"myField".cast(DoubleType))
}.reduce(_.union(_))
}

这种方法可以更好地处理大量文件,因为它可以保持沿袭较短

def merge2(spark: SparkSession, paths: Seq[String]): DataFrame = {
import spark.implicits._

spark.sparkContext.union(paths.par.map {
path =>
spark.read.parquet(path).withColumn("myField", $"myField".cast(DoubleType)).as[Double].rdd
}.toList).toDF
}

关于scala - 读取 parquet 文件时无法解析具有 int 和 double 的合并架构,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53829952/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com