gpt4 book ai didi

scala - Spark 任务无法将行写入 ORC 表

转载 作者:行者123 更新时间:2023-12-03 20:47:58 27 4
gpt4 key购买 nike

我为几何字段上的空间连接运行以下代码:

 val coverage = DimCoverageReader.apply(spark, params)
coverage.createOrReplaceTempView("dim_coverage")

val uniqueGeometries = spark.table(params.UniqueGeometriesTable)
uniqueGeometries.createOrReplaceTempView("unique_geometries")


spark
.sql(
"""select a.*, b.lac, b.cell_id
|from unique_geometries as a, dim_coverage as b
|where ST_Intersects(ST_GeomFromWKT(a.geo_wkt), ST_GeomFromWKT(b.geo_wkt))
|""".stripMargin)
生成的数据帧稍后保存到 ORC 表中:
Stage(spark,params).write
.format("orc")
.mode(SaveMode.Overwrite)
.saveAsTable(params.IntersectGeometriesTable)
我在执行过程中收到此错误:
org.apache.spark.SparkException:写入行时任务失败
    0/10/30 17:37:19 ERROR Executor: Exception in task 205.0 in stage 4.0 (TID 1219)
org.apache.spark.SparkException: Task failed while writing rows
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:270)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:189)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:188)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Column has wrong number of index entries found: 320 expected: 800
at org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:803)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1742)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:2133)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java:352)
at org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryManager.java:168)
at org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.java:157)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:2413)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:76)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:55)
at org.apache.spark.sql.hive.orc.OrcOutputWriter.write(OrcFileFormat.scala:248)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:325)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:256)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:254)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:259)
... 8 more
这个问题的根本原因是什么?

最佳答案

如果这适用于 format('parquet')我的猜测是您有某种结构类型或格式问题。您可以添加 printSchema为你的 DF?

关于scala - Spark 任务无法将行写入 ORC 表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64611879/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com