python - PySpark:类型错误:+ 不支持的操作数类型: 'datetime.datetime' 和 'str'-6ren

python - PySpark:类型错误:+ 不支持的操作数类型: 'datetime.datetime' 和 'str'

转载作者：行者123 更新时间：2023-12-01 00:43:53

28

4

我在 PySpark 中有具有以下架构的 DataFrame:

root
 |-- id: string (nullable = true)
 |-- date: timestamp (nullable = true)
 |-- time: string (nullable = true)
 |-- start: timestamp (nullable = true)
 |-- end: timestamp (nullable = true)

我想再添加一列 timestamp 类型的 date_time:

import datetime

to_datetime_func =  udf (lambda d, t: datetime.strptime(d+" "+t, "%Y-%m-%d %H:%M:%S"), TimestampType())
df = df.withColumn("date_time", to_datetime_func("date","time"))

这段代码编译得很好。但是，当我运行使用 date_time 列的简单过滤操作时，出现错误:

root
 |-- id: string (nullable = true)
 |-- date_time: timestamp (nullable = true)
 |-- start: timestamp (nullable = true)
 |-- end: timestamp (nullable = true)


from pyspark.sql import functions as func

df \
    .filter(func.col("date_time")>=func.col("start"))
    .select("id","date_time","start") \
    .show()

错误:

Py4JJavaError: An error occurred while calling o2966.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 30.0 failed 4 times, most recent failure: Lost task 2.3 in stage 30.0 (TID 765, 10.139.64.4, executor 0): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/worker.py", line 403, in main
    process()
  File "/databricks/spark/python/pyspark/worker.py", line 398, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/databricks/spark/python/pyspark/serializers.py", line 365, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File "/databricks/spark/python/pyspark/serializers.py", line 147, in dump_stream
    for obj in iterator:
  File "/databricks/spark/python/pyspark/serializers.py", line 354, in _batched
    for item in iterator:
  File "<string>", line 1, in <lambda>
  File "/databricks/spark/python/pyspark/worker.py", line 83, in <lambda>
    return lambda *a: toInternal(f(*a))
  File "/databricks/spark/python/pyspark/util.py", line 99, in wrapper
    return f(*args, **kwargs)
  File "<command-4293391875175815>", line 1, in <lambda>
TypeError: unsupported operand type(s) for +: 'datetime.datetime' and 'str'

    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:490)
    at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:81)
    at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:64)
    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:444)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:638)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
    at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1124)
    at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1130)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
    at scala.collection.Iterator$class.foreach(Iterator.scala:891)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
    at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:299)
    at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$2.writeIteratorToStream(PythonUDFRunner.scala:50)
    at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:383)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2076)
    at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:223)

更新:

my_concat_func =  udf (lambda d, t: datetime.strptime(d+" "+t, "%Y-%m-%d %H:%M:%S"), StringType())
df = df.withColumn("date", df["date"].cast(StringType()))
df = df.withColumn("date_time", my_concat_func("date","time"))


df.select("date","time","date_time").printSchema()

root
 |-- date: string (nullable = true)
 |-- time: string (nullable = true)
 |-- date_time: string (nullable = true)


df.select("date","time","date_time").show()

ValueError: unconverted data remains: 03:34:26

最佳答案

你能尝试一下并让我知道输出吗:

timeFmt = "yyyy-MM-dd'T'HH:mm:ss.SSS"
    df \
        .filter((func.unix_timestamp('date_time', format=timeFmt) >= func.unix_timestamp('start', format=timeFmt)))
        .select("id","date_time","start") \
        .show()

编辑

For the question how to get only date and not time :

df = df.withColumn("new_data", func.to_date(df.date, 'yyyy-MM-dd'))
df.printSchema()

df = df.withColumn("new_data", df['new_data'].cast(StringType()))
df.show(10, False)
df.printSchema()

#### Output ####
+------------------------+
|date                    |
+------------------------+
|2015-07-02T11:22:21.050Z|
|2016-03-20T21:00:00.000Z|
+------------------------+
root
 |-- date: string (nullable = true)
 |-- new_data: date (nullable = true)
+------------------------+----------+
|date                    |new_data  |
+------------------------+----------+
|2015-07-02T11:22:21.050Z|2015-07-02|
|2016-03-20T21:00:00.000Z|2016-03-20|
+------------------------+----------+
root
 |-- date: string (nullable = true)
 |-- new_data: string (nullable = true)

关于python - PySpark:类型错误:+ 不支持的操作数类型: 'datetime.datetime' 和 'str'，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57144339/

28

4

0

文章推荐： graph - 通过 Cypher 在 Neo4j 中不存在搜索

文章推荐： Javascript 类在构造函数中或类外部定义属性

文章推荐： python - ttk 菜单不会取消发布

c - 在将指针传递给函数时使用 &-操作数？
int enter_path(char** path) { char* standard = "./questions.txt"; printf("\n\t%s\n\t%s",
c - 左值需要作为一元 ‘&’ 操作数
我有以下几行代码: #define PORT 9987 和 char *ptr = (char *)&PORT; 这似乎适用于我的服务器代码。但是当我在我的客户端代码中写它时，它给出了这个错误信息:
c++ - 映射运算符 [] 操作数
大家好，我在成员函数中有以下内容 int tt = 6; vector>& temp = m_egressCandidatesByDestAndOtMode[tt]; set& egressCandi
java - 正则表达式以某种方式组合 AND 操作数、NOT 操作数和捕获组
我知道您可以通过以下方式在正则表达式中使用 NOT 操作数: [^AB] :匹配除 "A" 之外的任何内容或"B" A(?!B) :匹配"A" ，后面不跟 "B" (?
c - c 函数指针中需要左值作为一元 ‘&’ 操作数
我的代码如下，下面还解释了为什么会发生左值； typedef struct test_item { char id[MENU_NAME_LEN + NULL_SPACE]; MenuF
javascript ->> 运算符/操作数/修饰符是做什么的？
我正在审查一些 javascript 代码，程序员在几个地方使用了 >>。我试图在谷歌上搜索但找不到这个操作数/运算符的作用。所以我来了。下面的代码示例: var triplet=(((binarra
bash - 大量参数(操作数)在命令行参数传递中排在首位
我使用以下行(希望这是最佳实践，如果不正确请纠正我)来处理命令行选项: #!/usr/bin/bash read -r -d '' HELP &2 for i in "${invalid_opti
swift - 二元运算符不能应用于两个 'NSWindowStyleMask' 操作数
我正在尝试编辑一个计时器应用程序，出现了这行代码。我该如何解决？ let styleMask: Int = NSClosableWindowMask | NSTitledWindowMask 错误是:
swift - 有什么方法可以快速划分两个 DateComponents 操作数
我可以得到两个特定日期之间的差异，这将等于日期总数。现在我想将工作日除以总天数并得到整数输出。 @IBAction func go(_ sender: UIButton) { let con
c - 左值需要作为一元 '&"操作数
我的项目有一个问题，它应该使用一个线程将每一行相加，然后将它们全部相加，但是我收到一个错误，指出左值需要作为一元 '&"操作数 pthread_create(&tid, NULL, &sum_line
c - 左值需要作为一元 ‘&’ 操作数——将函数结果作为指针传递
我的代码有问题。有以下功能: static Poly PolyFromCoeff(int coeff); static Mono MonoFromPoly(const Poly *p, int exp
c# - C# 中字符串的 OR 操作数
在 C# 中是否没有字符串的 OR 操作数？我正在查看 Microsoft C# 操作数页面 - 没有关于字符串的任何类型的 OR。我有一个要写的 if 语句: if (Convert.ToStr
c++ - 左移一个 'double' 操作数
下面的函数左移一个double操作数: double shl(double x,unsigned long long n) { unsigned long long* p = (unsigne
测试文件时 Linux 意外的运算符/操作数
我在 Linux 中使用了以下简单的 ksh 脚本 #!/bin/ksh set -x ### Process list of *.dat files if [ -f *.dat ] then pri
c# - 多次使用 ||和 && 操作数
我有一个使用 Entity Framework 的查询。它有许多不同的操作数，我对其优先级感到困惑。我得到了错误的结果。我需要所有 IsPaid == true 或 IsPaid == null 的记
xcode - “+=”不能应用于两个 [AnyObject] 操作数
我有以下代码来尝试创建一个约束数组以添加到 View 中: let views = ["button": button] let metrics = ["margin": 16] var constr
swift - 任何方式链接 == 和 ||操作数
这个问题在这里已经有了答案: How to compare one value against multiple values - Swift (8 个答案) 关闭 6 年前。我有一种情况，我必须
jquery - 操作数 a 中的 'in' 无效
我使用 jquery $.ajax 将请求发送到服务器，它返回 JSON。 $.ajax({ url: 'moreMonth.ajax', data: { startIndex: id },
java - 或在 if 语句中使用 int 操作数
我的问题是程序没有按照“他”的预期读取代码。我有 if (hero.getPos() == (6 | 11 | 16)) { move = new Object[] {"Up", "Righ
c++ - 错误左值需要作为一元 '&' 操作数
我在对象中创建线程时遇到问题。错误是需要作为一元“&”操作数的左值 CPP文件 #include "AirQ.h" static int i=0; AirQ::AirQ(int pinNo, bool

首页

博学

6Ren·AI

商城

python - PySpark:类型错误:+ 不支持的操作数类型: 'datetime.datetime' 和 'str'