I have a timestamp column
我有一个时间戳专栏
data = [(1,'2023-01-22 09:00'),(2,'2023-09-11 00:09')]
schema = StructType([StructField("id",IntegerType(),False),StructField("ts",StringType(),True)])
main_df = spark.createDataFrame(data,schema)
main_df.printSchema()
root
|-- id: integer (nullable = false)
|-- ts: string (nullable = true)
main_df2 = main_df.withColumn('ts', date_format(to_timestamp(col('ts'),("yyyy-MM-dd HH:mm")),"yyyy-MM-dd HH:mm").cast("timestamp")).show()
main_df2.printSchema()
root
|-- id: integer (nullable = false)
|-- ts: timestamp (nullable = true)
main_df2.show()
+---+-------------------+
| id| ts|
+---+-------------------+
| 1|2023-01-22 09:00:00|
| 2|2023-09-11 00:09:00|
+---+-------------------+
Is it possible to have a timestamp datatype column, in Pyspark, without the seconds, like yyyy-MM-dd HH:mm?
是否可以使用不带秒的时间戳数据类型列,如yyyy-MM-dd hh:mm?
Desired Output
期望输出
+---+----------------+
| id| ts|
+---+----------------+
| 1|2023-01-22 09:00|
| 2|2023-09-11 00:09|
+---+----------------+~
root
|-- id: integer (nullable = false)
|-- ts: timestamp (nullable = true
Thanks in advande
谢谢你的好意
更多回答
in spark, only yyyy-MM-dd HH:mm:ss
is the acceptable timestamp format. all others are considered strings.
在Spark中,只有yyyy-MM-dd hh:mm:ss是可接受的时间戳格式。所有其他的都被认为是字符串。
优秀答案推荐
You don't need .cast("timestamp")
after you did a date_format
- just remove it and you'll get what you need:
在执行DATE_FORMAT之后,您不需要.cast(“时间戳”)--只需删除它,您就会得到所需的内容:
main_df.withColumn('ts', date_format(to_timestamp(col('ts'),
("yyyy-MM-dd HH:mm")),"yyyy-MM-dd HH:mm")).show()
+---+----------------+
| id| ts|
+---+----------------+
| 1|2023-01-22 09:00|
| 2|2023-09-11 00:09|
+---+----------------+
更多回答
Thanks for your help. But in your solution the column ts will be string type but I want timestamp type
谢谢你的帮助。但在您的解决方案中,列ts将是字符串类型,但我想要时间戳类型
timestamp
type is by definition with seconds & milliseconds. The show
just visualizes that data.
根据定义,TIMESTAMP类型为秒和毫秒。该节目只是将这些数据可视化。
我是一名优秀的程序员,十分优秀!