gpt4 book ai didi

scala - 为什么在 spark-3 上写入 1900 年之前的时间戳不会抛出 SparkUpgradeException?

转载 作者:行者123 更新时间:2023-12-05 06:49:39 30 4
gpt4 key购买 nike

在页面上: https://www.waitingforcode.com/apache-spark-sql/whats-new-apache-spark-3-proleptic-calendar-date-time-management/read
我们可以阅读:

reading dates before 1582-10-15 ortimestamps before 1900-01-01T00:00:00Z from Parquet files can be ambiguous,as the files may be written by Spark 2.x or legacy versions of Hive,which uses a legacy hybrid calendar thatis different from Spark 3.0+'s Proleptic Gregorian calendar

请考虑以下未抛出异常的场景:

scala> spark.conf.get("spark.sql.legacy.parquet.datetimeRebaseModeInWrite")
res27: String = EXCEPTION
scala> Seq(java.sql.Timestamp.valueOf("1899-01-01 00:00:00")).toDF("col").write.parquet("/tmp/someDate")
scala> // why did not it throw exception?

虽然对于抛出 1582 异常之前的日期:

scala> Seq(java.sql.Date.valueOf("1581-01-01")).toDF("col").write.parquet("/tmp/someOtherDate")
21/03/10 19:07:19 ERROR Utils: Aborting task
org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: writing dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z into Parquet files can be dangerous, as the files may be read by Spark 2.x or legacy versions of Hive later, which uses a legacy hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See more details in SPARK-31404. You can set spark.sql.legacy.parquet.datetimeRebaseModeInWrite to 'LEGACY' to rebase the datetime values w.r.t. the calendar difference during writing, to get maximum interoperability. Or set spark.sql.legacy.parquet.datetimeRebaseModeInWrite to 'CORRECTED' to write the datetime values as it is, if you are 100% sure that the written files will only be read by Spark 3.0+ or other systems that use Proleptic Gregorian calendar.

谁能解释一下这个区别?

最佳答案

我有 spark 3.1.2 版本我已经测试了两种情况并且在两种情况下都抛出异常...请引用以下内容:

scala> Seq(java.sql.Timestamp.valueOf("1899-01-01 00:00:00")).toDF("col").write.parquet("/tmp/someDate")
22/01/04 18:03:53 ERROR Utils: Aborting task (0 + 1) / 1]
org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: writing dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z into Parquet INT96 files can be dangerous, as the files may be read by Spark 2.x or legacy versions of Hive later, which uses a legacy hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See more details in SPARK-31404. You can set spark.sql.legacy.parquet.int96RebaseModeInWrite to 'LEGACY' to rebase the datetime values w.r.t. the calendar difference during writing, to get maximum interoperability. Or set spark.sql.legacy.parquet.int96RebaseModeInWrite to 'CORRECTED' to write the datetime values as it is, if you are 100% sure that the written files will only be read by Spark 3.0+ or other systems that use Proleptic Gregorian calendar. here

还有第二种情况:

scala> Seq(java.sql.Date.valueOf("1581-01-01")).toDF("col").write.parquet("/tmp/someOtherDate1")
22/01/04 18:05:08 ERROR Utils: Aborting task
org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: writing dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z into Parquet files can be dangerous, as the files may be read by Spark 2.x or legacy versions of Hive later, which uses a legacy hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See more details in SPARK-31404. You can set spark.sql.legacy.parquet.datetimeRebaseModeInWrite to 'LEGACY' to rebase the datetime values w.r.t. the calendar difference during writing, to get maximum interoperability. Or set spark.sql.legacy.parquet.datetimeRebaseModeInWrite to 'CORRECTED' to write the datetime values as it is, if you are 100% sure that the written files will only be read by Spark 3.0+ or other systems that use Proleptic Gregorian calendar.

关于scala - 为什么在 spark-3 上写入 1900 年之前的时间戳不会抛出 SparkUpgradeException?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66571309/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com