gpt4 book ai didi

cluster-computing - quartz 如何检测节点失败

转载 作者:行者123 更新时间:2023-12-04 15:24:52 24 4
gpt4 key购买 nike

我的生产环境使用quartz 2.1.4 运行java 调度程序作业。在具有 4 台机器的 weblogic 集群服务器上,只有一个调度作业在一个集群节点(节点 1)上正常执行几个月,但节点 2 突然发现节点 1 在昨晚接管执行作业时失败。事实上,节点1没有错误(根据服务器、网络、数据库、应用程序日志),这个事件导致由于2个进程并发执行而创建了重复消息。

quartz 检测节点故障的机制是什么?通过 ping 扫描,或通过 UCP 广播心跳 ping,或其他数据库响应时间?上面有什么配置吗?

我已经阅读了 quartz 配置指南
http://quartz-scheduler.org/documentation/quartz-2.1.x/configuration/ConfigJDBCJobStoreClustering
,但没有答案。

我正在使用 JDBCJobstore。经过详细检查,我们发现有一个数据库(Oracle)语句执行异常长(从5秒到30秒)。事件发生在这段时间。你觉得有关系吗?

我的配置是

`
org.quartz.threadPool.threadCount=10

org.quartz.threadPool.threadPriority=5

org.quartz.jobStore.misfireThreshold = 10000

org.quartz.jobStore.class=org.quartz.impl.jdbcjobstore.JobStoreTX
`

有人有这个信息吗?谢谢。

最佳答案

我知道答案已经很晚了,但也许像我们俩这样的人仍然需要它。

简短版本:全部由 DB 处理。重要的属性是 org.quartz.jobStore.clusterCheckinInterval。

长版(所有学分转到 http://flylib.com/books/en/2.65.1.91/1/):

Detecting Failed Scheduler Nodes

When a Scheduler instance performs the check-in routine, it looks to see if there are other Scheduler instances that didn't check in when they were supposed to. It does this by inspecting the SCHEDULER_STATE table and looking for schedulers that have a value in the LAST_CHECK_TIME column that is older than the property org.quartz.jobStore.clusterCheckinInterval (discussed in the next section). If one or more nodes haven't checked in, the running Scheduler assumes that the other instance(s) have failed.



此外,下一段也可能很重要:

Running Nodes on Separate Machines with Unsynchronized Clocks

As you can ascertain by now, if you run nodes on different machines and the clocks are not synchronized, you can get unexpected results. This is because a timestamp is being used to inform other instances of the last time one node checked in. If that node's clock was set for the future, a running Scheduler might never realize that a node has gone down. On the other hand, if a clock on one node is set in the past, a node might assume that the node has gone down and attempt to take over and rerun its jobs. In either case, it's not the behavior that you want. When you're using different machines in a cluster (which is the normal case), be sure to synchronize the clocks. See the section "Quartz Clustering Cookbook," later in this chapter for details on how to do this.

关于cluster-computing - quartz 如何检测节点失败,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12966625/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com