apache-spark - Spark 流 : Custom Receiver : Data source : Websphere Message Queue-6ren

apache-spark - Spark 流 : Custom Receiver : Data source : Websphere Message Queue

转载作者：行者123 更新时间：2023-12-04 04:58:52

我正在尝试在 Spark 流中为 WSMQ 数据源实现客户接收器。我遵循了提供的示例 here .

后来我仿照 this Github repository 的例子.

我遇到了三个问题:

1:错误(程序运行一段时间后出现此错误)

java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at org.apache.kafka.common.network.Selector.poll(Selector.java:238)
    at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192)
    at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191)
    at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122)
    at java.lang.Thread.run(Thread.java:745)

即使我在创建 session 时使用了这段代码，程序也不会从 WSMQ 中删除消息

MQQueueSession qSession = (MQQueueSession) qCon.createQueueSession(false, Session.AUTO_ACKNOWLEDGE);

我需要实现在 Custom Receiver Spark API 中解释的可靠的 Receiver。它说:
要实现可靠的接收器，您必须使用 store(multiple-records) 来存储数据。这种存储方式是一种阻塞调用，只有在所有给定记录都存储在 Spark 中后才会返回。如果接收方配置的存储级别使用复制(默认启用)，则此调用将在复制完成后返回。因此它确保数据被可靠地存储，并且接收者现在可以适本地确认来源。这确保当接收方在复制数据的过程中发生故障时不会产生任何数据——缓冲的数据将不会被确认，因此稍后将由源重新发送。

我不知道如何处理存储(多记录)？

我不知道为什么会发生这些错误，也不知道如何实现可靠的 Receiver。

代码如下:

public class JavaConnector extends Receiver<String> {

    String host = null;
    int port = -1;
    String qm=null;
    String qn=null;
    String channel=null;
    transient Gson gson=new Gson();
    transient MQQueueConnection qCon= null;
    String topic=null;

    Enumeration enumeration =null;
    private static MQQueueReceiver receiver = null;


    public JavaConnector(String host , int port, String qm, String channel, String qn) {
        super(StorageLevel.MEMORY_ONLY_2());
        this.host = host;
        this.port = port;
        this.qm=qm;
        this.qn=qn;
        this.channel=channel;


    }

    public void onStart()  {
        // Start the thread that receives data over a connection
        new Thread()  {
            @Override public void run() {
                try {
                    initConnection();
                    receive();
                }
                catch (JMSException ex)
                {
                    ex.printStackTrace();
                }
                catch (Exception ex)
                {
                    ex.printStackTrace();
                }
            }
        }.start();
    }

    public void onStop() {

        // There is nothing much to do as the thread calling receive()
        // is designed to stop by itself isStopped() returns false

    }

    /** Create a MQ connection and receive data until receiver is stopped */
    private void receive() throws InterruptedException {
        System.out.print("Started receiving messages from MQ");


        try {

            JMSTextMessage receivedMessage= null;
            int cnt =0;

            //JMSTextMessage receivedMessage = (JMSTextMessage) receiver.receive(10000);

            boolean flag=false;
            while (!isStopped() && enumeration.hasMoreElements()&&cnt<50 )
            {

                receivedMessage= (JMSTextMessage) enumeration.nextElement();
                receivedMessage.acknowledge();
                String userInput = receivedMessage.getText();

                    ArrayList<String> list = new ArrayList<String>();
                    list.add(userInput);
                    Iterator<String> itr = list.iterator();
                    store(itr);
                cnt++;

            }
            /*while (!isStopped() && receivedMessage !=null)
            {

               // receivedMessage= (JMSTextMessage) enumeration.nextElement();
                String userInput = receivedMessage.getText();

                store(userInput);
        receivedMessage.acknowledge();

            }*/

            // Restart in an attempt to connect again when server is active again
            //restart("Trying to connect again");

            stop("No More Messages To read !");
            qCon.close();
            System.out.println("Queue Connection is Closed");

        }
        catch(Exception e)
        {      Thread.sleep(100);
            System.out.println("WRONG"+e.toString());
            e.printStackTrace();
            restart("Trying to connect again");
        }
        catch(Throwable t) {
            Thread.sleep(100);
            System.out.println("WRONG-1"+t.toString());
            // restart if there is any other error
            restart("Error receiving data", t);
        }



    }

    public void initConnection() throws JMSException,InterruptedException {
        try {
            MQQueueConnectionFactory conFactory = new MQQueueConnectionFactory();
            conFactory.setHostName(host);
            conFactory.setPort(port);
            conFactory.setTransportType(JMSC.MQJMS_TP_CLIENT_MQ_TCPIP);
            conFactory.setQueueManager(qm);
            conFactory.setChannel(channel);
            conFactory.setMsgBatchSize(100);


            qCon = (MQQueueConnection) conFactory.createQueueConnection();
            MQQueueSession qSession = (MQQueueSession) qCon.createQueueSession(false, Session.AUTO_ACKNOWLEDGE);
            MQQueue queue = (MQQueue) qSession.createQueue(qn);
            MQQueueBrowser browser = (MQQueueBrowser) qSession.createBrowser(queue);
            qCon.start();
            //receiver = (MQQueueReceiver) qSession.createReceiver(queue);
            enumeration= browser.getEnumeration();


        } catch (Exception e) {
            Thread.sleep(1000);
        }
    }

    @Override
    public StorageLevel storageLevel() {
        return StorageLevel.MEMORY_ONLY_2();
    }

最佳答案

终于解决了这个问题。解决方案 1:Steaming 上下文尝试写入 Kafka，因为 kafka 已关闭并且它给我 IO 错误。我真傻。 :)

解决方案 2:我应该使用 MessageListener，QueueBrowser 用于读取消息它实际上并不使用消息。

关于apache-spark - Spark 流 : Custom Receiver : Data source : Websphere Message Queue，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35095286/

文章推荐： ruby-on-rails - rails 5 分贝 :reset not working

apache-spark - Spark 如何处理比 Spark 存储大得多的数据？
目前正在学习 Spark 的类(class)并了解到执行者的定义: Each executor will hold a chunk of the data to be processed. Thisc
apache-spark - Spark 中的任务是什么？ Spark Worker如何执行jar文件？
阅读了有关 http://spark.apache.org/docs/0.8.0/cluster-overview.html 的一些文档后，我有一些问题想要澄清。以 Spark 为例: JavaSp
apache-spark - Spark 调度器与 Spark 堆栈中的独立调度器
Spark核心中的调度器与以下Spark Stack(来自Learning Spark:Lightning-Fast Big Data Analysis一书)中的Standalone Schedule
apache-spark - Apache Spark : setting spark. eventLog.enabled 和 Spark.eventLog.dir 在提交或 Spark 启动时
我想在 spark-submit 或 start 处设置 spark.eventLog.enabled 和 spark.eventLog.dir -all level -- 不要求在 scala/ja
apache-spark - Spark - Spark DataFrame、Spark SQL 和/或 Databricks 表中的混合区分大小写
我有来自 SQL Server 的数据，需要在 Apache Spark (Databricks) 中进行操作。在 SQL Server 中，此表的三个键列使用区分大小写的 COLLATION 选项
apache-spark - spark.local.ip ,spark.driver.host,spark.driver.bindAddress 和 spark.driver.hostname 是什么？
所有这些有什么区别和用途？ spark.local.ip spark.driver.host spark.driver.bind地址 spark.driver.hostname 如何将机器修复为 Sp
apache-spark - 跨多个 Spark 作业重用 Spark session
我有大约 10 个 Spark 作业，每个作业都会进行一些转换并将数据加载到数据库中。必须为每个作业单独打开和关闭 Spark session ，每次初始化都会耗费时间。是否可以只创建一次 Spar
apache-spark - spark 3.0- spark 聚合函数给出了与预期不同的表达式
/Downloads/spark-3.0.1-bin-hadoop2.7/bin$ ./spark-shell 20/09/23 10:58:45 WARN Utils: Your hostname,
apache-spark - 提交 Spark 作业到 Spark 集群
我是 Spark 的完全新手，并且刚刚开始对此进行更多探索。我选择了更长的路径，不使用任何 CDH 发行版安装 hadoop，并且我从 Apache 网站安装了 Hadoop 并自己设置配置文件以了解
apache-spark - Spark 显示的内核数与使用 spark-submit 传递给它的内核数不同
TL; 博士 Spark UI 显示的内核和内存数量与我在使用 spark-submit 时要求的数量不同更多细节: 我在独立模式下运行 Spark 1.6。当我运行 spark-submit 时
apache-spark - Spark pyspark 与 spark-submit
spark-submit 上的文档说明如下: The spark-submit script in Spark’s bin directory is used to launch applicatio
apache-spark - 在同一集群中同时进行 Spark 流和 Spark 批处理作业的最佳实践
关闭。这个问题是opinion-based .它目前不接受答案。想改善这个问题吗？更新问题，以便可以通过 editing this post 用事实和引文回答问题. 6 个月前关闭。 Improve
apache-spark - Spark : Is receiver in spark streaming a bottleneck?
我想了解接收器如何在 Spark Streaming 中工作。根据我的理解，将有一个接收器任务在执行器中运行，用于收集数据并保存为 RDD。当调用 start() 时，接收器开始读取。需要澄清以下内容
apache-spark - 如何使用相同的 spark 上下文并行运行多个 spark 作业？
有没有办法在不同线程中使用相同的 spark 上下文并行运行多个 spark 作业？我尝试使用 Vertx 3，但看起来每个作业都在排队并按顺序启动。如何让它在相同的 spark 上下文中同时运行
apache-spark - 如何在不停止 Spark 流的情况下清理 Spark 历史事件日志
我们有一个 Spark 流应用程序，这是一项长期运行的任务。事件日志指向 hdfs 位置 hdfs://spark-history，当我们开始流式传输应用程序时正在其中创建 application_X
apache-spark - 使用 Spark - Spark JobServer 的基于请求的实时推荐？
我们正在尝试找到一种加载 Spark (2.x) ML 训练模型的方法，以便根据请求(通过 REST 接口(interface))我们可以查询它并获得预测，例如http://predictor.com
apache-spark - spark-sql 与 spark-shell REPL 中的 Spark SQL 性能差异
Spark newb 问题:我在 spark-sql 中进行完全相同的 Spark SQL 查询并在 spark-shell . spark-shell版本大约需要 10 秒，而 spark-sql版
apache-spark - Spark 累加器未显示在 Spark WebUI 中
我正在使用 Spark 流。根据 Spark 编程指南(参见 http://spark.apache.org/docs/latest/programming-guide.html#accumulato
scala - Spark : how to run spark file from spark shell
我正在使用 CDH 5.2。我可以使用 spark-shell 运行命令。如何运行包含spark命令的文件(file.spark)。有没有办法在不使用 sbt 的情况下在 CDH 5.2 中运行/
apache-spark - Spark-Cassandra 与 Spark-Elasticsearch
我使用 Elasticsearch 已经有一段时间了，但使用 Cassandra 的经验很少。现在，我有一个项目想要使用 Spark 来处理数据，但我需要决定是否应该使用 Cassandra 还是

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

apache-spark - Spark 流 : Custom Receiver : Data source : Websphere Message Queue