python - 从 Databricks 笔记本向 Azure Eventhubs 发送 Spark 数据帧时出现错误 (java.lang.NoSuchMethodError)-6ren

python - 从 Databricks 笔记本向 Azure Eventhubs 发送 Spark 数据帧时出现错误 (java.lang.NoSuchMethodError)

转载作者：行者123 更新时间：2023-12-03 05:18:03

我需要从我的 Databricks 笔记本将 pyspark Dataframe 发送到 Eventhub。问题发生在这部分代码:

ehWriteConf = {
  'eventhubs.connectionString' : EVENT_HUB_CONNECTION_STRING
}

def send_to_eventhub(df:DataFrame):
    ds = df.select(struct(*[c for c in df.columns]).alias("body"))\
      .select("body")\
      .write.format("eventhubs")\
      .options(**ehWriteConf)\
      .save()

我在对数据帧进行一些处理后调用此方法:

# write feature_df into our EventHub
send_to_eventhub(feature_df)

一些类似的问题表明这是一个库版本问题，因此我已经尝试了我找到的几个答案，例如安装以下库的兼容版本:

com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.22

但这是我收到的错误消息:

java.lang.NoSuchMethodError: org.apache.spark.sql.AnalysisException.&lt;init&gt;(Ljava/lang/String;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;)V

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<command-37526120346879> in <module>
      5 # write feature_df into our EventHub
      6 
----> 7 send_to_eventhub(feature_df)
      8 
      9 # implement reading data from EventHub through a loop in print statement

<command-2498519353602292> in send_to_eventhub(df)
     34     # .format("org.apache.spark.sql.eventhubs.EventHubsSourceProvider")\
     35     # .format("org.apache.spark.sql.eventhubs.EventHubsSourceProvider")
---> 36     ds = df.select(struct(*[c for c in df.columns]).alias("body"))\
     37       .select("body")\
     38       .write.format("eventhubs")\

/databricks/spark/python/pyspark/sql/readwriter.py in save(self, path, format, mode, partitionBy, **options)
    736             self.format(format)
    737         if path is None:
--> 738             self._jwrite.save()
    739         else:
    740             self._jwrite.save(path)

/databricks/spark/python/lib/py4j-0.10.9.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1302 
   1303         answer = self.gateway_client.send_command(command)
-> 1304         return_value = get_return_value(
   1305             answer, self.gateway_client, self.target_id, self.name)
   1306 

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
    115     def deco(*a, **kw):
    116         try:
--> 117             return f(*a, **kw)
    118         except py4j.protocol.Py4JJavaError as e:
    119             converted = convert_exception(e.java_exception)

/databricks/spark/python/lib/py4j-0.10.9.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    324             value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325             if answer[1] == REFERENCE_TYPE:
--> 326                 raise Py4JJavaError(
    327                     "An error occurred while calling {0}{1}{2}.\n".
    328                     format(target_id, ".", name), value)

Py4JJavaError: An error occurred while calling o1187.save.
: java.lang.NoSuchMethodError: org.apache.spark.sql.AnalysisException.<init>(Ljava/lang/String;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;)V
    at org.apache.spark.sql.eventhubs.EventHubsWriter$.validateQuery(EventHubsWriter.scala:58)
    at org.apache.spark.sql.eventhubs.EventHubsWriter$.write(EventHubsWriter.scala:70)
    at org.apache.spark.sql.eventhubs.EventHubsSourceProvider.createRelation(EventHubsSourceProvider.scala:124)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:80)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:78)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:89)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:160)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:239)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:386)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:186)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:968)
    at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:141)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:336)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:160)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:156)
    at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:575)
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:167)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:575)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:268)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:264)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:551)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:156)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:324)
    at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:156)
    at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:141)
    at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:132)
    at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:186)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:959)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:427)
    at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:396)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:258)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
    at py4j.Gateway.invoke(Gateway.java:295)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:251)
    at java.lang.Thread.run(Thread.java:748)

问题之一是不太清楚没有找到什么方法。

我运行笔记本的集群详细信息是:

最佳答案

要写入的数据帧需要具有以下架构:

Column                    |  Type
----------------------------------------------
body (required)           |  string or binary 
partitionId (*optional)   |  string 
partitionKey (*optional)  |  string

这对我有用。

df.withColumn('body', F.to_json(
       F.struct(*df.columns),
       options={"ignoreNullFields": False}))\
   .select('body')\
   .write\
   .format("eventhubs")\
   .options(**ehconf)\
   .save()

关于python - 从 Databricks 笔记本向 Azure Eventhubs 发送 Spark 数据帧时出现错误 (java.lang.NoSuchMethodError)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/73962665/

文章推荐： javascript - 使用两个 for 循环来比较两个字符串

文章推荐： eclipse - 带有gradle的Eclipse智能

azure-eventhub - 在 Azure Eventhub 接收器中给出 "Encountered error while fetching the list of EventHub PartitionIds"错误
我正尝试按照教程实现接收器部分 https://azure.microsoft.com/en-us/documentation/articles/event-hubs-java-ephjava-get
c# - 如何将消息从 eventhub 发送到另一个 eventhub？
我想发送给eventhub客户，然后将其下载到示例数据(例如天气)并发送另一个eventhub。我的代码无法正常工作。没有错误，但数据没有发送到数据库。 public Task ProcessEven
azure-eventhub - 如何在 Eventhub 中实现过滤器？
在我的应用程序中，发布到 Eventhub 的各种事件。但是我的消费者群体只需要特定的一组事件。如何在 Eventhub 中过滤这个？最佳答案关注此 post : event hubs doesn
Azure Eventhub - 如何将同一批事件重新发送/重播到同一 Eventhub 客户端
比方说，当处理来自 Azure EventHub 的批量事件时发生 transient 故障并且 transient 故障即使在重试后仍然持续，那么可以从处理器向 Eventhub 抛出哪种异常？以便
azure - 如何将数据从一个 Azure EventHub 复制到另一个 Azure EventHub？
没有现成的解决方案可以将数据从一个 Azure EventHub 克隆到另一个 EventHub。有哪些可能的选择来实现这一目标？最佳答案复制 Azure EventHub 流的一个简单选项是在
azure - 能否在不丢失检查点的情况下从 Microsoft.Azure.EventHubs 库迁移到 Azure.Messaging.EventHubs？
Azure 事件中心发布了一个现代客户端库 (Azure.Messaging.EventHubs)，用于读取和写入事件中心。新库应该取代旧库 (Microsoft.Azure.EventHubs)，所
azure - azure-eventhubs 和 spring-cloud-azure-eventhubs-stream-binder 有什么区别？
我的要求是使用 Spring 的 Azure 事件中心进行简单的发布-订阅。在检查文档后，我发现了 2 篇文章演示了集成。 One uses azure-eventhubs图书馆和the other
azure - azure-eventhubs 和 spring-cloud-azure-eventhubs-stream-binder 有什么区别？
我的要求是使用 Spring 的 Azure 事件中心进行简单的发布-订阅。在检查文档后，我发现了 2 篇文章演示了集成。 One uses azure-eventhubs图书馆和the other
azure-eventhub - 使用 EventProcessorClient 客户端读取 Azure EventHub 分区列表的 Java API(最新 SDK)
我需要获取 EventHub 的分区列表。我正在尝试使用最新 SDK 中的 EventProcessorClient。这似乎没有 getRuntimeInformation 方法。有什么方法可以使用
azure - 最佳实践 : to partition eventhub data & achieve high-scale, 通过 azure eventhub 到外部存储(azure blob)的低延迟和高吞吐量
作为安全产品的一部分，我拥有大规模云服务(azure 辅助角色)，它从事件中心读取事件，将它们批量处理到约 2000 个，然后存储在 blob 存储中。每个事件都有一个 MachineId(发送该事件
c# - 升级 ‘Azure.Messaging.EventHubs’ 到 5.6.2 构建时返回 'EventHubs' 命名空间中不存在 'Microsoft.Azure' 错误
升级然后构建返回 “命名空间“Microsoft.Azure”中不存在类型或命名空间名称“EventHubs”(是否缺少程序集引用？)[sss-af-filter]” “找不到类型或命名空间名称“
Azure EventHub 高级吞吐量限制
使用 Event Hub Premium 时我们必须计算的 Azure Event Hub 吞吐量限制是多少？ The documentation说使用高级层时每个 PU 没有限制，但我不明白这意味着
Azure EventHub 和持久函数
实际上是在尝试做一些我不擅长的事情。我在这里阅读了持久功能概述 - https://learn.microsoft.com/en-us/azure/azure-functions/durable/d
java - EventHub 函数的发送方和接收方应用程序的输出
我在 java 中运行事件中心函数的发送者类应用程序。下面是输出: [main] INFO com.azure.messaging.eventhubs.EventHubClientBuild
Azure EventHub 事件格式
全部，我设置了 EventHub 命名空间和 EventHub，并能够使用 Python 脚本成功向其发送和接收事件。我还能够启用捕获功能并将事件以 Avro 格式存储在 Azure Blob 存储
Azure Eventhub 消费者
为什么我们需要 Azure 存储帐户上的 blob 容器用于 Eventhub 消费者客户端(我使用的是 python)。为什么我们不能像在 Kafka 中那样直接使用来自 Eventhub(Kafk
Azure EventHub 事件格式
全部，我设置了 EventHub 命名空间和 EventHub，并能够使用 Python 脚本成功向其发送和接收事件。我还能够启用捕获功能并将事件以 Avro 格式存储在 Azure Blob 存储
Azure eventhub 多个分区键指向同一分区
我们正在开发一个 Multi-Tenancy 应用程序，其中 eventhub 将在不同租户之间共享。我们将在租户之间分配分区。每个租户将在不同的分区上发送消息。我们希望在分区级别对租户进行身份验证。
azure - eventhub 中的大规模消息处理
据我了解，eventhub 每秒可以处理/摄取数百万条消息。为了调整摄取，我们可以使用吞吐量。更高的吞吐量=更强的摄取能力。但是在接收/消费方面，您最多可以创建 32 个接收者(因为我们可以创建
Azure Eventhub 消费者
为什么我们需要 Azure 存储帐户上的 blob 容器用于 Eventhub 消费者客户端(我使用的是 python)。为什么我们不能像在 Kafka 中那样直接使用来自 Eventhub(Kafk

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 从 Databricks 笔记本向 Azure Eventhubs 发送 Spark 数据帧时出现错误 (java.lang.NoSuchMethodError)