kubernetes - ActiveMQ Artemis 消费者在崩溃后不会重新连接到同一个集群节点-6ren

kubernetes - ActiveMQ Artemis 消费者在崩溃后不会重新连接到同一个集群节点

转载作者：行者123 更新时间：2023-12-05 03:41:56

我在 Kubernetes 中设置了一个 Artemis 集群，有 3 组主/从:

activemq-artemis-master-0                               1/1     Running
activemq-artemis-master-1                               1/1     Running
activemq-artemis-master-2                               1/1     Running
activemq-artemis-slave-0                                0/1     Running
activemq-artemis-slave-1                                0/1     Running
activemq-artemis-slave-2                                0/1     Running

Artemis 版本为 2.17.0。这是我在 master-0 broker.xml 中的集群配置。除了更改 connector-ref 以匹配代理之外，其他代理的配置相同:

<?xml version="1.0"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements.  See the NOTICE file
distributed with this work for additional information
regarding copyright ownership.  The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License.  You may obtain a copy of the License at

  http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied.  See the License for the
specific language governing permissions and limitations
under the License.
-->
<configuration xmlns="urn:activemq" xmlns:xi="http://www.w3.org/2001/XInclude" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">
  <core xmlns="urn:activemq:core" xsi:schemaLocation="urn:activemq:core ">
    <name>activemq-artemis-master-0</name>
    <persistence-enabled>true</persistence-enabled>
    <!-- this could be ASYNCIO, MAPPED, NIO
           ASYNCIO: Linux Libaio
           MAPPED: mmap files
           NIO: Plain Java Files
       -->
    <journal-type>ASYNCIO</journal-type>
    <paging-directory>data/paging</paging-directory>
    <bindings-directory>data/bindings</bindings-directory>
    <journal-directory>data/journal</journal-directory>
    <large-messages-directory>data/large-messages</large-messages-directory>
    <journal-datasync>true</journal-datasync>
    <journal-min-files>2</journal-min-files>
    <journal-pool-files>10</journal-pool-files>
    <journal-device-block-size>4096</journal-device-block-size>
    <journal-file-size>10M</journal-file-size>
    <!--
       This value was determined through a calculation.
       Your system could perform 1.1 writes per millisecond
       on the current journal configuration.
       That translates as a sync write every 911999 nanoseconds.

       Note: If you specify 0 the system will perform writes directly to the disk.
             We recommend this to be 0 if you are using journalType=MAPPED and journal-datasync=false.
      -->
    <journal-buffer-timeout>100000</journal-buffer-timeout>
    <!--
        When using ASYNCIO, this will determine the writing queue depth for libaio.
       -->
    <journal-max-io>4096</journal-max-io>
    <!--
        You can verify the network health of a particular NIC by specifying the <network-check-NIC> element.
         <network-check-NIC>theNicName</network-check-NIC>
        -->
    <!--
        Use this to use an HTTP server to validate the network
         <network-check-URL-list>http://www.apache.org</network-check-URL-list> -->
    <!-- <network-check-period>10000</network-check-period> -->
    <!-- <network-check-timeout>1000</network-check-timeout> -->
    <!-- this is a comma separated list, no spaces, just DNS or IPs
           it should accept IPV6

           Warning: Make sure you understand your network topology as this is meant to validate if your network is valid.
                    Using IPs that could eventually disappear or be partially visible may defeat the purpose.
                    You can use a list of multiple IPs, and if any successful ping will make the server OK to continue running -->
    <!-- <network-check-list>10.0.0.1</network-check-list> -->
    <!-- use this to customize the ping used for ipv4 addresses -->
    <!-- <network-check-ping-command>ping -c 1 -t %d %s</network-check-ping-command> -->
    <!-- use this to customize the ping used for ipv6 addresses -->
    <!-- <network-check-ping6-command>ping6 -c 1 %2$s</network-check-ping6-command> -->
    <!-- how often we are looking for how many bytes are being used on the disk in ms -->
    <disk-scan-period>5000</disk-scan-period>
    <!-- once the disk hits this limit the system will block, or close the connection in certain protocols
           that won't support flow control. -->
    <max-disk-usage>90</max-disk-usage>
    <!-- should the broker detect dead locks and other issues -->
    <critical-analyzer>true</critical-analyzer>
    <critical-analyzer-timeout>120000</critical-analyzer-timeout>
    <critical-analyzer-check-period>60000</critical-analyzer-check-period>
    <critical-analyzer-policy>HALT</critical-analyzer-policy>
    <page-sync-timeout>2244000</page-sync-timeout>
    <!-- the system will enter into page mode once you hit this limit.
           This is an estimate in bytes of how much the messages are using in memory

            The system will use half of the available memory (-Xmx) by default for the global-max-size.
            You may specify a different value here if you need to customize it to your needs.

            <global-max-size>100Mb</global-max-size>

      -->
    <acceptors>
      <!-- useEpoll means: it will use Netty epoll if you are on a system (Linux) that supports it -->
      <!-- amqpCredits: The number of credits sent to AMQP producers -->
      <!-- amqpLowCredits: The server will send the # credits specified at amqpCredits at this low mark -->
      <!-- amqpDuplicateDetection: If you are not using duplicate detection, set this to false
                                      as duplicate detection requires applicationProperties to be parsed on the server. -->
      <!-- amqpMinLargeMessageSize: Determines how many bytes are considered large, so we start using files to hold their data.
                                       default: 102400, -1 would mean to disable large mesasge control -->
      <!-- Note: If an acceptor needs to be compatible with HornetQ and/or Artemis 1.x clients add
                    "anycastPrefix=jms.queue.;multicastPrefix=jms.topic." to the acceptor url.
                    See https://issues.apache.org/jira/browse/ARTEMIS-1644 for more information. -->
      <!-- Acceptor for every supported protocol -->
      <acceptor name="artemis">tcp://0.0.0.0:61616?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;amqpMinLargeMessageSize=102400;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpDuplicateDetection=true</acceptor>
      <!-- AMQP Acceptor.  Listens on default AMQP port for AMQP traffic.-->
      <acceptor name="amqp">tcp://0.0.0.0:5672?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=AMQP;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpMinLargeMessageSize=102400;amqpDuplicateDetection=true</acceptor>
      <!-- STOMP Acceptor. -->
      <acceptor name="stomp">tcp://0.0.0.0:61613?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=STOMP;useEpoll=true</acceptor>
      <!-- HornetQ Compatibility Acceptor.  Enables HornetQ Core and STOMP for legacy HornetQ clients. -->
      <acceptor name="hornetq">tcp://0.0.0.0:5445?anycastPrefix=jms.queue.;multicastPrefix=jms.topic.;protocols=HORNETQ,STOMP;useEpoll=true</acceptor>
      <!-- MQTT Acceptor -->
      <acceptor name="mqtt">tcp://0.0.0.0:1883?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=MQTT;useEpoll=true</acceptor>
    </acceptors>
    <security-settings>
      <security-setting match="#">
        <permission type="createNonDurableQueue" roles="amq"/>
        <permission type="deleteNonDurableQueue" roles="amq"/>
        <permission type="createDurableQueue" roles="amq"/>
        <permission type="deleteDurableQueue" roles="amq"/>
        <permission type="createAddress" roles="amq"/>
        <permission type="deleteAddress" roles="amq"/>
        <permission type="consume" roles="amq"/>
        <permission type="browse" roles="amq"/>
        <permission type="send" roles="amq"/>
        <!-- we need this otherwise ./artemis data imp wouldn't work -->
        <permission type="manage" roles="amq"/>
      </security-setting>
    </security-settings>
    <address-settings>
      <!-- if you define auto-create on certain queues, management has to be auto-create -->
      <address-setting match="activemq.management#">
        <dead-letter-address>DLQ</dead-letter-address>
        <expiry-address>ExpiryQueue</expiry-address>
        <redelivery-delay>0</redelivery-delay>
        <!-- with -1 only the global-max-size is in use for limiting -->
        <max-size-bytes>-1</max-size-bytes>
        <message-counter-history-day-limit>10</message-counter-history-day-limit>
        <address-full-policy>PAGE</address-full-policy>
        <auto-create-queues>true</auto-create-queues>
        <auto-create-addresses>true</auto-create-addresses>
        <auto-create-jms-queues>true</auto-create-jms-queues>
        <auto-create-jms-topics>true</auto-create-jms-topics>
      </address-setting>
      <!--default for catch all-->
      <address-setting match="#">
        <dead-letter-address>DLQ</dead-letter-address>
        <expiry-address>ExpiryQueue</expiry-address>
        <redelivery-delay>0</redelivery-delay>
        <!-- with -1 only the global-max-size is in use for limiting -->
        <max-size-bytes>-1</max-size-bytes>
        <message-counter-history-day-limit>10</message-counter-history-day-limit>
        <address-full-policy>PAGE</address-full-policy>
        <auto-create-queues>true</auto-create-queues>
        <auto-create-addresses>true</auto-create-addresses>
        <auto-create-jms-queues>true</auto-create-jms-queues>
        <auto-create-jms-topics>true</auto-create-jms-topics>
      </address-setting>
    </address-settings>
    <addresses>
      <address name="DLQ">
        <anycast>
          <queue name="DLQ"/>
        </anycast>
      </address>
      <address name="ExpiryQueue">
        <anycast>
          <queue name="ExpiryQueue"/>
        </anycast>
      </address>
    </addresses>
    <!-- Uncomment the following if you want to use the Standard LoggingActiveMQServerPlugin pluging to log in events
      <broker-plugins>
         <broker-plugin class-name="org.apache.activemq.artemis.core.server.plugin.impl.LoggingActiveMQServerPlugin">
            <property key="LOG_ALL_EVENTS" value="true"/>
            <property key="LOG_CONNECTION_EVENTS" value="true"/>
            <property key="LOG_SESSION_EVENTS" value="true"/>
            <property key="LOG_CONSUMER_EVENTS" value="true"/>
            <property key="LOG_DELIVERING_EVENTS" value="true"/>
            <property key="LOG_SENDING_EVENTS" value="true"/>
            <property key="LOG_INTERNAL_EVENTS" value="true"/>
         </broker-plugin>
      </broker-plugins>
      -->
    <cluster-user>clusterUser</cluster-user>
    <cluster-password>aShortclusterPassword</cluster-password>
    <connectors>
      <connector name="activemq-artemis-master-0">tcp://activemq-artemis-master-0.activemq-artemis-master.svc.cluster.local:61616</connector>
      <connector name="activemq-artemis-slave-0">tcp://activemq-artemis-slave-0.activemq-artemis-slave.svc.cluster.local:61616</connector>
      <connector name="activemq-artemis-master-1">tcp://activemq-artemis-master-1.activemq-artemis-master.svc.cluster.local:61616</connector>
      <connector name="activemq-artemis-slave-1">tcp://activemq-artemis-slave-1.activemq-artemis-slave.svc.cluster.local:61616</connector>
      <connector name="activemq-artemis-master-2">tcp://activemq-artemis-master-2.activemq-artemis-master.svc.cluster.local:61616</connector>
      <connector name="activemq-artemis-slave-2">tcp://activemq-artemis-slave-2.activemq-artemis-slave.svc.cluster.local:61616</connector>
    </connectors>
    <cluster-connections>
      <cluster-connection name="activemq-artemis">
        <connector-ref>activemq-artemis-master-0</connector-ref>
        <retry-interval>500</retry-interval>
        <retry-interval-multiplier>1.1</retry-interval-multiplier>
        <max-retry-interval>5000</max-retry-interval>
        <initial-connect-attempts>-1</initial-connect-attempts>
        <reconnect-attempts>-1</reconnect-attempts>
        <message-load-balancing>ON_DEMAND</message-load-balancing>
        <max-hops>1</max-hops>
        <!-- scale-down>true</scale-down -->
        <static-connectors>
          <connector-ref>activemq-artemis-master-0</connector-ref>
          <connector-ref>activemq-artemis-slave-0</connector-ref>
          <connector-ref>activemq-artemis-master-1</connector-ref>
          <connector-ref>activemq-artemis-slave-1</connector-ref>
          <connector-ref>activemq-artemis-master-2</connector-ref>
          <connector-ref>activemq-artemis-slave-2</connector-ref>
        </static-connectors>
      </cluster-connection>
    </cluster-connections>
    <ha-policy>
      <replication>
        <master>
          <group-name>activemq-artemis-0</group-name>
          <quorum-vote-wait>12</quorum-vote-wait>
          <vote-on-replication-failure>true</vote-on-replication-failure>
          <!--we need this for auto failback-->
          <check-for-live-server>true</check-for-live-server>
        </master>
      </replication>
    </ha-policy>
  </core>
  <core xmlns="urn:activemq:core">
    <jmx-management-enabled>true</jmx-management-enabled>
  </core>
</configuration>

我的消费者在 Spring Boot 应用程序中定义为 JmsListener。在消费队列中的消息期间，Spring Boot 应用程序崩溃，导致 kubernetes 删除 pod 并重新创建一个新的 pod。但是，我注意到新的 pod 没有连接到同一个 Artemis 节点，因此从未使用过之前连接的剩余消息。

我认为使用集群的全部意义在于让所有 Artemis 节点作为一个单元向消费者传递消息，而不管它连接到哪个节点。我错了吗？如果集群无法将消费者连接重新路由到正确的节点(该节点保留来自先前消费者的剩余消息)，那么处理这种情况的推荐方法是什么？

最佳答案

首先，请务必注意，在客户端崩溃/重启后，没有任何功能可以让客户端重新连接到它断开连接的代理。一般来说，客户端不应该真正关心它连接到哪个代理；这是横向可扩展性的主要目标之一。

还值得注意的是，如果代理上的消息数量和连接的客户端数量足够少，这种情况经常出现，这几乎肯定意味着您的集群中有太多代理。

也就是说，我相信您的客户端没有收到预期消息的原因是因为您使用的是默认的 redistribution-delay(即 -1 ) 这意味着消息将不会重新分发到集群中的其他节点。如果你想启用重新分配(这看起来像你做的)那么你应该将它设置为 >= 0，例如:

      <address-setting match="#">
        ...
        <redistribution-delay>0</redistribution-delay>
        ...
      </address-setting>

您可以在 the documentation 中阅读有关重新分配的更多信息.

除此之外，您可能需要总体上重新考虑您的拓扑结构。通常，如果您处于类似云的环境(例如使用 Kubernetes 的环境)中，基础设施本身将重启失败的 pod，那么您将不会使用主/从配置。您只需将日志挂载到单独的 pod 上(例如使用 NFSv4)，这样当节点出现故障时它会重新启动，然后重新连接回其持久存储。这有效地提供了代理的高可用性(主/从是为云环境外部设计的)。

此外，根据用例，ActiveMQ Artemis 的单个实例每秒可以处理数百万条消息，因此您实际上可能不需要 3 个事件节点来满足预期负载。

请注意，这些是关于您的整体架构的一般建议，与您的问题没有直接关系。

关于kubernetes - ActiveMQ Artemis 消费者在崩溃后不会重新连接到同一个集群节点，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/67490806/

文章推荐： swift - VStack 子级填充(而不是扩展)其父级

文章推荐： c# - 直接设置字典元组值

kafka的Java客户端-消费者
kafka的Java客户端-消费者一、kafka消费方式 pull（拉）模式：consumer采用从broker中主动拉取数据。Kafka 采用这种方式 push（推）模式：Kafka没有采用这种方
具有多线程的Python生产者/消费者
我编写这个小应用程序是为了解决 Python 中的经典生产者/消费者问题。我知道我可以使用线程安全的队列机制来解决这个问题，但我有兴趣自己解决这个问题来学习。 from threading impor
消费者/生产者程序卡住了
下面是一个示例消费者/生产者模型的代码: int buffer[MAX]; int fill_ptr = 0; int use_ptr = 0; int count = 3; void put(int
消费者-生产者问题
我的消费者、生产者程序有问题，它似乎可以加载，但返回段错误。我已经尝试了一切来修复它，但仍然失败!将不胜感激任何帮助。笔记;代码真的很多，semaphore.h的代码都在里面，有谁想测试一下。其余代码
填充所有缓冲区的算法生产者-消费者
我正在阅读著名的操作系统概念书(Avi Silberschatz、Peter Baer Galvin、Greg Gagne)第 9 版:http://codex.cs.yale.edu/avi/os-
c# - 具有节流持续时间和批量消费的异步生产者/消费者
我正在尝试构建一个服务，为许多异步客户端提供队列以发出请求并等待响应。我需要能够通过每 Y 个持续时间的 X 个请求来限制队列处理。例如:每秒 50 个 Web 请求。它用于第 3 方 REST 服务
c# - 拥有资源的生产者-消费者
我正在尝试使用一组资源来实现生产者/消费者模式，因此每个线程都有一个与之关联的资源。例如，我可能有一个任务队列，其中每个任务都需要一个 StreamWriter写出它的结果。每个任务还必须有参数传
Azure Eventhub 消费者
为什么我们需要 Azure 存储帐户上的 blob 容器用于 Eventhub 消费者客户端(我使用的是 python)。为什么我们不能像在 Kafka 中那样直接使用来自 Eventhub(Kafk
java - 区间分区的生产者-消费者
我有一个有趣的生产者-消费者衍生产品需要实现，但我无法理解它的算法。因此，每个生产者都会“产生”给定范围(最小值，最大值)之间的数字，这对除以给定“商”给出了相同的提醒。对于消费者来说也是如此。额外
java - 如何使用自动线程管理在Java中实现生产者/消费者
我需要实现一种生产者/消费者方案，出于性能原因，消费者尝试在一批中处理许多工作项(每个工作项都会耗尽工作队列)。目前，我只是创建固定数量的相同工作人员，它们在循环中的同一队列上工作。由于其中一些可能
Azure Eventhub 消费者
为什么我们需要 Azure 存储帐户上的 blob 容器用于 Eventhub 消费者客户端(我使用的是 python)。为什么我们不能像在 Kafka 中那样直接使用来自 Eventhub(Kafk
java - Java中的复合生产者-消费者
我的关系必须按如下方式运作；线程 A 向线程 B 发布一些更改，线程 B 接受该更改并将其发布到线程 C。问题是生产者-消费者，我使用 BlockingQueue 仅用两个实体来实现它没有问题。我怎
java - 使用java同步理解生产者-消费者
我一直在研究 PC 问题，以了解 Java 同步和线程间通信。使用底部的代码，输出为 Producer produced-0 Producer produced-1 Producer produced
java - 使用同步的生产者-消费者
我编写了代码来实现生产者-消费者问题，它似乎工作正常，不需要同步。这可能吗？如何测试代码并检查它是否确实正常工作？我如何知道是否会发生死锁？现在，我没有跳出循环(即生产者不断插入，消费者不断在无限循
java - java线程生产者-消费者
我必须完成一项练习，我必须使用至少一个生产者线程和 x 个消费者线程的生产者/消费者模式在我的文件夹路径中查找“.java”文件。生产者消费者级:首先，当生产者完成查找文件时，我尝试通过设置从 tr
c - 消费者/生产者任务的解决方案
我被分配了一项类(class)作业来实现消费者/生产者问题的解决方案，该解决方案使用单个生产者、单个消费者和循环缓冲区。这应该用 C 语言编写。不幸的是，我们没有获得任何学习 Material ，并
c - 具有有限缓冲区的生产者/消费者
有人可以检查我的代码并告诉我是否走在正确的轨道上。我似乎有点迷失了。如果您看到我的错误，请告诉我它们。我想做的是使用我自己的信号量以及 GCD 来解决有界缓冲区问题。提前致谢.. sema.c v
消费者-生产者，断言失败
我要处理有界缓冲区、生产者消费者问题，只能修改 prod 和 cons 函数。此代码仅在一个消费者和生产者线程上运行，不会出现任何问题。但对于每个都有多个，迟早总会给我带来同样的问题: p5p1:
c# - 异步生产者/消费者
我有一个从多个线程访问的类的实例。此类接受此调用并将元组添加到数据库中。我需要以串行方式完成此操作，因为由于某些数据库约束，并行线程可能会导致数据库不一致。由于我不熟悉 C# 中的并行性和并发性，所
java - 具有批量和刷新功能的生产者/消费者
我正在尝试编写一个批量邮件服务，它有两种方法: add(Mail mail):可以发送邮件，由Producers调用 flushMailService():刷新服务。消费者应该获取一个列表，并调用另一

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

kubernetes - ActiveMQ Artemis 消费者在崩溃后不会重新连接到同一个集群节点