cluster-computing - quartz 如何检测节点失败-6ren

cluster-computing - quartz 如何检测节点失败

转载作者：行者123 更新时间：2023-12-04 15:24:52

24

4

我的生产环境使用quartz 2.1.4 运行java 调度程序作业。在具有 4 台机器的 weblogic 集群服务器上，只有一个调度作业在一个集群节点(节点 1)上正常执行几个月，但节点 2 突然发现节点 1 在昨晚接管执行作业时失败。事实上，节点1没有错误(根据服务器、网络、数据库、应用程序日志)，这个事件导致由于2个进程并发执行而创建了重复消息。

quartz 检测节点故障的机制是什么？通过 ping 扫描，或通过 UCP 广播心跳 ping，或其他数据库响应时间？上面有什么配置吗？

我已经阅读了 quartz 配置指南
http://quartz-scheduler.org/documentation/quartz-2.1.x/configuration/ConfigJDBCJobStoreClustering
，但没有答案。

我正在使用 JDBCJobstore。经过详细检查，我们发现有一个数据库(Oracle)语句执行异常长(从5秒到30秒)。事件发生在这段时间。你觉得有关系吗？

我的配置是

`
org.quartz.threadPool.threadCount=10

org.quartz.threadPool.threadPriority=5

org.quartz.jobStore.misfireThreshold = 10000

org.quartz.jobStore.class=org.quartz.impl.jdbcjobstore.JobStoreTX
`

有人有这个信息吗？谢谢。

最佳答案

我知道答案已经很晚了，但也许像我们俩这样的人仍然需要它。

简短版本:全部由 DB 处理。重要的属性是 org.quartz.jobStore.clusterCheckinInterval。

长版(所有学分转到 http://flylib.com/books/en/2.65.1.91/1/):

Detecting Failed Scheduler Nodes

When a Scheduler instance performs the check-in routine, it looks to see if there are other Scheduler instances that didn't check in when they were supposed to. It does this by inspecting the SCHEDULER_STATE table and looking for schedulers that have a value in the LAST_CHECK_TIME column that is older than the property org.quartz.jobStore.clusterCheckinInterval (discussed in the next section). If one or more nodes haven't checked in, the running Scheduler assumes that the other instance(s) have failed.

此外，下一段也可能很重要:

Running Nodes on Separate Machines with Unsynchronized Clocks

As you can ascertain by now, if you run nodes on different machines and the clocks are not synchronized, you can get unexpected results. This is because a timestamp is being used to inform other instances of the last time one node checked in. If that node's clock was set for the future, a running Scheduler might never realize that a node has gone down. On the other hand, if a clock on one node is set in the past, a node might assume that the node has gone down and attempt to take over and rerun its jobs. In either case, it's not the behavior that you want. When you're using different machines in a cluster (which is the normal case), be sure to synchronize the clocks. See the section "Quartz Clustering Cookbook," later in this chapter for details on how to do this.

关于cluster-computing - quartz 如何检测节点失败，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/12966625/

24

4

0

文章推荐： wpf - 在静态方法中使用 WPF 检测设计模式

文章推荐： operating-system - 操作系统如何执行编译后的二进制文件？

文章推荐： macros - 如何解决 Visual Studio 2012 中缺少宏的问题

文章推荐： xcode - cocos2d-iphone-2.1-beta2 中 CCScrollLayer 的问题

boost-compute - boost::compute 复制自定义结构
是否可以使用 boost::compute 复制自定义结构数组？例如 struct A { float a; }; struct AB { float a; float b; }; BOOST_COM
google-compute-engine - 在网络之间移动Google Compute VM实例
有谁知道如何在项目上配置的Google Cloud Networks之间移动现有的VM实例？我知道您可以在创建新VM时指定网络，但是似乎没有一种实例化它即可对其进行更改的功能。任何指针表示赞赏! 保
google-compute-engine - 在Google Compute Engine中创建VM实例时出错
我正在尝试遵循Google云平台控制台上的教程，以使用Compute Engine创建MongoDB应用程序。我遵循创建虚拟机的教程，但未创建它们，并返回以下错误: 字段“resource.netwo
google-compute-engine - 实时更新Google Compute Engine实例类型
我想知道是否可以将一个实例的类型更改为另一个实例，例如: n1-standard-1至n1-standard-2 我在文档中没有看到任何内容，但可能是我错过了一些东西。如果这是不可能的，我该如何处理
google-compute-engine - 一个Google Compute Engine实例上的多个IP地址
我正在尝试让我的GCE实例监听多个IP地址(出于SEO的原因-在同一实例上托管多个低流量站点)。最终目标:mydomain.com指向IP1，myotherdomain.es指向IP2，GCE实例将
google-compute-engine - 是否可以将目录从Google Compute Engine实例复制到我的本地计算机？
使用scp，我可以添加-r标志以通过ssh将目录下载到本地计算机。使用时： gcloud compute scp -r 它说“ -r”不是可用选项。没有-r，我会收到一条错误消息，说我的源路径是一
google-compute-engine - Compute Engine HTTP负载平衡502错误
仅某些IP地址的http负载均衡器存在严重问题。我在这里还看到了其他一些帖子。我们确保防火墙正常，甚至删除并重新创建了转发规则。自IP更改以来，这非常令人讨厌。仍然没有喜悦。问题仅影响某些IP地址
google-compute-engine - 在Google Compute Engine上删除或释放静态IP
我正在尝试删除/删除不再使用的静态IP地址，并且看不到执行此操作的方法。我可以从文档中得到的最接近的是this page，它说: When an instance is stopped, you ca
google-compute-engine - Google Compute 实例可以按计划启动和停止吗？
我每天从台式计算机运行一个简单的任务 3 次。它在下午 4 点、晚上 8 点和凌晨 1 点连接到某个网站，下载少量数据(小于 50mb)，并将其存储在硬盘上。每天运行这一点很重要，所以我正在考虑将其转
google-compute-engine - Google Compute Engine上所有新规则的防火墙规则默认范围值
我是一个关于在Google Compute Engine上联网的问题。是否可以在您的帐户上设置默认的“源代码/ IP范围”，以便在创建新规则时自动设置此值？例如，如果我创建此规则: gcloud
google-compute-engine - Compute Engine API - 如何使用容器创建实例？
我正在尝试使用 POST gcloud CLI 创建一个“带有容器”的 GCE 实例(由 https://www.googleapis.com/compute/v1/projects/{project
google-compute-engine - Google Compute Engine 实例上的启动脚本可以使用命令行参数运行吗？
我们像这样通过命令行创建实例: gcloud compute instances create instance-name [--stuff otherstuff] --metadata-from-f
google-compute-engine - 如何将存储库从源存储库克隆到 Compute Engine 以在需要时手动提取它
在我的本地 macOS 上，以下命令运行良好: $ gcloud source repos clone myrepo --project=myproject (虽然我不确定它是否有效，因为之前我遵循了
google-compute-engine - 如何以编程方式启动 Google Compute 实例？
在 AWS SDK , EC2实例可以通过 AmazonEC2Client 以编程方式启动.是否GCP一般或Compute Engine特别just offer the CLI-based gclou
google-compute-engine - Google Compute Engine 实例是否休眠？
我想使用 SparkleShare 在计算机之间同步文件，所以我正在寻找一种方法让 git 存储库在线保存文件。我正在考虑使用 Google Compute Engine 来托管它们。如果我只为我实
computer-science - "Introduction to Computer Science and Programming"初学者
关闭。这个问题不满足Stack Overflow guidelines .它目前不接受答案。想改善这个问题吗？更新问题，使其成为 on-topic对于堆栈溢出。 4年前关闭。 Improve thi
google-compute-engine - 如何关闭 Compute Engine 服务
几周前，我在随意玩弄 Google Cloud Console，并创建了一个 Compute Engine VM。这个过程就像“创建一个虚拟机，命名它，保存”一样简单，就是这样。我现在不需要 Comp
google-compute-engine - Google Compute Engine - 如何将代码更新到实例组的所有实例？
我正在关注 this instruction在 GCE 上设置多个实例或服务器集群。它运行良好，但我不知道如何更新应用程序代码。例如，我有一些错误修复，需要更新代码并重新加载所有实例。无论如何我可以做
google-compute-engine - Google Compute Engine 健康检查失败
我在两个 VM 实例上有一个 node.js 应用程序，我试图通过网络负载平衡来进行负载平衡。为了测试我的服务器是否已启动并提供服务，我在我的应用程序内部监听端口上收到了运行状况检查请求“/healt
google-compute-engine - Google Compute Engine:如何永久设置主机名？
如何在GCE中永久设置实例的主机名？我可以通过主机名进行设置，但是重启后它又消失了。我试图输入元数据（主机名：f.q.d.n），但这没有完成。但是它应该通过元数据（https://github.co

首页

博学

6Ren·AI

商城

cluster-computing - quartz 如何检测节点失败