gpt4 book ai didi

postgresql - spark 将字符串转换为 TimestampType

转载 作者:行者123 更新时间:2023-11-29 14:15:26 27 4
gpt4 key购买 nike

我有一个数据框,我想将其插入到 Spark 中的 Postgresql 中。在 spark 中,DateTimestamp 列为字符串格式。在 postgreSQL 中,它是没有时区的 TimeStamp。

在日期时间列上插入数据库时​​引发错误。我确实尝试更改数据类型,但插入仍然出错。我无法弄清楚为什么转换不起作用。如果我将相同的插入字符串粘贴到 PgAdmin 中并运行,插入语句运行正常。

import java.text.SimpleDateFormat;
import java.util.Calendar
object EtlHelper {
// Return the current time stamp

def getCurrentTime() : String = {
val now = Calendar.getInstance().getTime()
val hourFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss")
return hourFormat.format(now)
}
}

在另一个文件中

object CreateDimensions {

def createDimCompany(spark:SparkSession, location:String, propsLocation :String):Unit = {
import spark.implicits._

val dimCompanyStartTime = EtlHelper.getCurrentTime()
val dimcompanyEndTime = EtlHelper.getCurrentTime()
val prevDimCompanyId = 2
val numRdd = 27
val AuditDF = spark.createDataset(Array(("dim_company", prevDimCompanyId,numRdd,dimCompanyStartTime,dimcompanyEndTime))).toDF("audit_tbl_name","audit_tbl_id","audit_no_rows","audit_tbl_start_date","audit_tbl_end_date")//.show()

AuditDF.withColumn("audit_tbl_start_date",AuditDF.col("audit_tbl_start_date").cast(DataTypes.TimestampType))
AuditDF.withColumn("audit_tbl_end_date",AuditDF.col("audit_tbl_end_date").cast(DataTypes.TimestampType))

AuditDF.printSchema()
}
}

root
|-- audit_tbl_name: string (nullable = true)
|-- audit_tbl_id: long (nullable = false)
|-- audit_no_rows: long (nullable = false)
|-- audit_tbl_start_date: string (nullable = true)
|-- audit_tbl_end_date: string (nullable = true)

这是我得到的错误

INSERT INTO etl.audit_master ("audit_tbl_name","audit_tbl_id","audit_no_rows","audit_tbl_start_date","audit_tbl_end_date") VALUES ('dim_company',27,2,'2018-05-02 12:15:54','2018-05-02 12:15:59') was aborted: ERROR: column "audit_tbl_start_date" is of type timestamp without time zone but expression is of type character varying
Hint: You will need to rewrite or cast the expression.

感谢任何帮助。

谢谢

最佳答案

AuditDF.printSchema() 采用原始 AuditDF 数据帧,因为您没有通过分配保存 .withColumn 的转换。 数据帧是不可变的对象,可以转换为另一个数据帧但不能改变自身。因此,您始终需要一个分配来保存您应用的转换。

所以正确的方法是分配以保存更改

val transformedDF = AuditDF.withColumn("audit_tbl_start_date",AuditDF.col("audit_tbl_start_date").cast(DataTypes.TimestampType))
.withColumn("audit_tbl_end_date",AuditDF.col("audit_tbl_end_date").cast("timestamp"))

transformedDF.printSchema()

你会看到变化

root
|-- audit_tbl_name: string (nullable = true)
|-- audit_tbl_id: integer (nullable = false)
|-- audit_no_rows: integer (nullable = false)
|-- audit_tbl_start_date: timestamp (nullable = true)
|-- audit_tbl_end_date: timestamp (nullable = true)

.cast(DataTypes.TimestampType).cast("timestamp") 都是一样的

关于postgresql - spark 将字符串转换为 TimestampType,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50142142/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com