- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我在 PySpark 中有具有以下架构的 DataFrame:
root
|-- id: string (nullable = true)
|-- date: timestamp (nullable = true)
|-- time: string (nullable = true)
|-- start: timestamp (nullable = true)
|-- end: timestamp (nullable = true)
我想再添加一列 timestamp
类型的 date_time
:
import datetime
to_datetime_func = udf (lambda d, t: datetime.strptime(d+" "+t, "%Y-%m-%d %H:%M:%S"), TimestampType())
df = df.withColumn("date_time", to_datetime_func("date","time"))
这段代码编译得很好。但是,当我运行使用 date_time
列的简单过滤操作时,出现错误:
root
|-- id: string (nullable = true)
|-- date_time: timestamp (nullable = true)
|-- start: timestamp (nullable = true)
|-- end: timestamp (nullable = true)
from pyspark.sql import functions as func
df \
.filter(func.col("date_time")>=func.col("start"))
.select("id","date_time","start") \
.show()
错误:
Py4JJavaError: An error occurred while calling o2966.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 30.0 failed 4 times, most recent failure: Lost task 2.3 in stage 30.0 (TID 765, 10.139.64.4, executor 0): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/databricks/spark/python/pyspark/worker.py", line 403, in main
process()
File "/databricks/spark/python/pyspark/worker.py", line 398, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/databricks/spark/python/pyspark/serializers.py", line 365, in dump_stream
self.serializer.dump_stream(self._batched(iterator), stream)
File "/databricks/spark/python/pyspark/serializers.py", line 147, in dump_stream
for obj in iterator:
File "/databricks/spark/python/pyspark/serializers.py", line 354, in _batched
for item in iterator:
File "<string>", line 1, in <lambda>
File "/databricks/spark/python/pyspark/worker.py", line 83, in <lambda>
return lambda *a: toInternal(f(*a))
File "/databricks/spark/python/pyspark/util.py", line 99, in wrapper
return f(*args, **kwargs)
File "<command-4293391875175815>", line 1, in <lambda>
TypeError: unsupported operand type(s) for +: 'datetime.datetime' and 'str'
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:490)
at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:81)
at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:64)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:444)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:638)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1124)
at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1130)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:299)
at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$2.writeIteratorToStream(PythonUDFRunner.scala:50)
at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:383)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2076)
at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:223)
更新:
my_concat_func = udf (lambda d, t: datetime.strptime(d+" "+t, "%Y-%m-%d %H:%M:%S"), StringType())
df = df.withColumn("date", df["date"].cast(StringType()))
df = df.withColumn("date_time", my_concat_func("date","time"))
df.select("date","time","date_time").printSchema()
root
|-- date: string (nullable = true)
|-- time: string (nullable = true)
|-- date_time: string (nullable = true)
df.select("date","time","date_time").show()
ValueError: unconverted data remains: 03:34:26
最佳答案
你能尝试一下并让我知道输出吗:
timeFmt = "yyyy-MM-dd'T'HH:mm:ss.SSS"
df \
.filter((func.unix_timestamp('date_time', format=timeFmt) >= func.unix_timestamp('start', format=timeFmt)))
.select("id","date_time","start") \
.show()
编辑
For the question how to get only date and not time :
df = df.withColumn("new_data", func.to_date(df.date, 'yyyy-MM-dd'))
df.printSchema()
df = df.withColumn("new_data", df['new_data'].cast(StringType()))
df.show(10, False)
df.printSchema()
#### Output ####
+------------------------+
|date |
+------------------------+
|2015-07-02T11:22:21.050Z|
|2016-03-20T21:00:00.000Z|
+------------------------+
root
|-- date: string (nullable = true)
|-- new_data: date (nullable = true)
+------------------------+----------+
|date |new_data |
+------------------------+----------+
|2015-07-02T11:22:21.050Z|2015-07-02|
|2016-03-20T21:00:00.000Z|2016-03-20|
+------------------------+----------+
root
|-- date: string (nullable = true)
|-- new_data: string (nullable = true)
关于python - PySpark:类型错误:+ 不支持的操作数类型: 'datetime.datetime' 和 'str',我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57144339/
int enter_path(char** path) { char* standard = "./questions.txt"; printf("\n\t%s\n\t%s",
我有以下几行代码: #define PORT 9987 和 char *ptr = (char *)&PORT; 这似乎适用于我的服务器代码。但是当我在我的客户端代码中写它时,它给出了这个错误信息:
大家好,我在成员函数中有以下内容 int tt = 6; vector>& temp = m_egressCandidatesByDestAndOtMode[tt]; set& egressCandi
我知道您可以通过以下方式在正则表达式中使用 NOT 操作数: [^AB] :匹配除 "A" 之外的任何内容或"B" A(?!B) :匹配"A" ,后面不跟 "B" (?
我的代码如下,下面还解释了为什么会发生左值; typedef struct test_item { char id[MENU_NAME_LEN + NULL_SPACE]; MenuF
我正在审查一些 javascript 代码,程序员在几个地方使用了 >>。我试图在谷歌上搜索但找不到这个操作数/运算符的作用。所以我来了。下面的代码示例: var triplet=(((binarra
我使用以下行(希望这是最佳实践,如果不正确请纠正我)来处理命令行选项: #!/usr/bin/bash read -r -d '' HELP &2 for i in "${invalid_opti
我正在尝试编辑一个计时器应用程序,出现了这行代码。我该如何解决? let styleMask: Int = NSClosableWindowMask | NSTitledWindowMask 错误是:
我可以得到两个特定日期之间的差异,这将等于日期总数。现在我想将工作日除以总天数并得到整数输出。 @IBAction func go(_ sender: UIButton) { let con
我的项目有一个问题,它应该使用一个线程将每一行相加,然后将它们全部相加,但是我收到一个错误,指出左值需要作为一元 '&"操作数 pthread_create(&tid, NULL, &sum_line
我的代码有问题。有以下功能: static Poly PolyFromCoeff(int coeff); static Mono MonoFromPoly(const Poly *p, int exp
在 C# 中是否没有字符串的 OR 操作数? 我正在查看 Microsoft C# 操作数页面 - 没有关于字符串的任何类型的 OR。 我有一个要写的 if 语句: if (Convert.ToStr
下面的函数左移一个double操作数: double shl(double x,unsigned long long n) { unsigned long long* p = (unsigne
我在 Linux 中使用了以下简单的 ksh 脚本 #!/bin/ksh set -x ### Process list of *.dat files if [ -f *.dat ] then pri
我有一个使用 Entity Framework 的查询。它有许多不同的操作数,我对其优先级感到困惑。我得到了错误的结果。我需要所有 IsPaid == true 或 IsPaid == null 的记
我有以下代码来尝试创建一个约束数组以添加到 View 中: let views = ["button": button] let metrics = ["margin": 16] var constr
这个问题在这里已经有了答案: How to compare one value against multiple values - Swift (8 个答案) 关闭 6 年前。 我有一种情况,我必须
我使用 jquery $.ajax 将请求发送到服务器,它返回 JSON。 $.ajax({ url: 'moreMonth.ajax', data: { startIndex: id },
我的问题是程序没有按照“他”的预期读取代码。 我有 if (hero.getPos() == (6 | 11 | 16)) { move = new Object[] {"Up", "Righ
我在对象中创建线程时遇到问题。错误是需要作为一元“&”操作数的左值 CPP文件 #include "AirQ.h" static int i=0; AirQ::AirQ(int pinNo, bool
我是一名优秀的程序员,十分优秀!