java - Spring 批处理 : Tasklet with multi threaded executor has very bad performances related to Throttling algorithm-6ren

java - Spring 批处理 : Tasklet with multi threaded executor has very bad performances related to Throttling algorithm

转载作者：塔克拉玛干更新时间：2023-11-03 03:43:33

26

4

使用Spring batch 2.2.1，我配置了一个Spring Batch Job，我用的是这个方法:

http://static.springsource.org/spring-batch/reference/html/scalability.html#multithreadedStep

配置如下:

Tasklet 使用限制为 15 个线程的 ThreadPoolTaskExecutor
throttle-limit 等于线程数
Chunk 用于:
- JdbcCursorItemReader 的 1 个同步适配器，根据 Spring Batch 文档的建议，允许多个线程使用它
  
  You can synchronize the call to read() and as long as the processing and writing is the most expensive part of the chunk your step may still complete much faster than in a single threaded configuration.
- JdbcCursorItemReader 上的 saveState 为 false
- 基于 JPA 的自定义 ItemWriter。 请注意，它对一个项目的处理在处理时间方面可能会有所不同，可能需要几毫秒到几秒(> 60 秒)。
- commit-interval 设置为 1(我知道它可以更好，但这不是问题)
所有 jdbc 池都很好，关于 Spring Batch 文档推荐

由于以下原因，运行批处理会导致非常奇怪和糟糕的结果:

在某个步骤中，如果编写器需要一些时间来处理这些项目，线程池中几乎所有线程最终都什么都不做而不是处理，只有慢速编写器在工作。

查看 Spring Batch 代码，根本原因似乎在这个包中:

org/springframework/batch/repeat/support/

这种工作方式是功能还是限制/错误？

如果它是一个功能，那么通过配置使所有线程都不会因长时间处理工作而饿死而不必重写所有内容的方式是什么？

请注意，如果所有项目都花费相同的时间，则一切正常，多线程也可以，但如果其中一个项目处理需要更多时间，那么在缓慢的过程中，多线程几乎毫无用处。

注意我打开这个问题:

https://jira.springsource.org/browse/BATCH-2081

最佳答案

正如亚历克斯所说，根据 javadocs 的规定，这种行为似乎是一种契约:

Subclasses just need to provide a method that gets the next result * and one that waits for all the results to be returned from concurrent * processes or threads

看:

TaskExecutorRepeatTemplate#waitForResults

您的另一个选择是使用分区:

一个 TaskExecutorPartitionHandler，它将执行来自 Partitionned ItemReader 的项目，见下文
提供 ItemReader 处理的范围的 Partitioner 实现，参见下面的 ColumnRangePartitioner
CustomReader 将使用 Partitioner 填充的内容读取数据，请参阅下面的 myItemReader 配置

Michael Minella 在他的书 Pro Spring Batch 的第 11 章中解释了这一点:

<batch:job id="batchWithPartition">
    <batch:step id="step1.master">
        <batch:partition  partitioner="myPartitioner" handler="partitionHandler"/>
    </batch:step>       
</batch:job>
<!-- This one will create Paritions of Number of lines/ Grid Size--> 
<bean id="myPartitioner" class="....ColumnRangePartitioner"/>
<!-- This one will handle every partition in a Thread -->
<bean id="partitionHandler" class="org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler">
    <property name="taskExecutor" ref="multiThreadedTaskExecutor"/>
    <property name="step" ref="step1" />
    <property name="gridSize" value="10" />
</bean>
<batch:step id="step1">
        <batch:tasklet transaction-manager="transactionManager">
            <batch:chunk reader="myItemReader"
                writer="manipulatableWriterForTests" commit-interval="1"
                skip-limit="30000">
                <batch:skippable-exception-classes>
                    <batch:include class="java.lang.Exception" />
                </batch:skippable-exception-classes>
            </batch:chunk>
        </batch:tasklet>
</batch:step>
 <!-- scope step is critical here-->
<bean id="myItemReader"    
                        class="org.springframework.batch.item.database.JdbcCursorItemReader" scope="step">
    <property name="dataSource" ref="dataSource"/>
    <property name="sql">
        <value>
            <![CDATA[
                select * from customers where id >= ? and id <=  ?
            ]]>
        </value>
    </property>
    <property name="preparedStatementSetter">
        <bean class="org.springframework.batch.core.resource.ListPreparedStatementSetter">
            <property name="parameters">
                <list>
 <!-- minValue and maxValue are filled in by Partitioner for each Partition in an ExecutionContext-->
                    <value>{stepExecutionContext[minValue]}</value>
                    <value>#{stepExecutionContext[maxValue]}</value>
                </list>
            </property>
        </bean>
    </property>
    <property name="rowMapper" ref="customerRowMapper"/>
</bean>

分区程序.java:

 package ...;
  import java.util.HashMap;  
 import java.util.Map;
 import org.springframework.batch.core.partition.support.Partitioner;
 import org.springframework.batch.item.ExecutionContext;
 public class ColumnRangePartitioner  implements Partitioner {
 private String column;
 private String table;
 public Map<String, ExecutionContext> partition(int gridSize) {
    int min =  queryForInt("SELECT MIN(" + column + ") from " + table);
    int max = queryForInt("SELECT MAX(" + column + ") from " + table);
    int targetSize = (max - min) / gridSize;
    System.out.println("Our partition size will be " + targetSize);
    System.out.println("We will have " + gridSize + " partitions");
    Map<String, ExecutionContext> result = new HashMap<String, ExecutionContext>();
    int number = 0;
    int start = min;
    int end = start + targetSize - 1;
    while (start <= max) {
        ExecutionContext value = new ExecutionContext();
        result.put("partition" + number, value);
        if (end >= max) {
            end = max;
        }
        value.putInt("minValue", start);
        value.putInt("maxValue", end);
        System.out.println("minValue = " + start);
        System.out.println("maxValue = " + end);
        start += targetSize;
        end += targetSize;
        number++;
    }
    System.out.println("We are returning " + result.size() + " partitions");
    return result;
}
public void setColumn(String column) {
    this.column = column;
}
public void setTable(String table) {
    this.table = table;
}
}

关于java - Spring 批处理 : Tasklet with multi threaded executor has very bad performances related to Throttling algorithm，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/18262857/

26

4

0

文章推荐： java - 柏林噪声的输出范围

文章推荐： performance - 基数排序的最佳基础

文章推荐： java - 如何在 java 中创建多部分压缩的 zip 文件

后藤此时出乎意料——批处理
我正在尝试制作一个基于文本的批处理游戏。但是我刚开始写我以前从未遇到过的问题时遇到了一个问题。 :menu :: the game menu - opens when the game starts
PHP 批处理
我正在构建一个社交媒体应用程序，用户需要发布一些内容，然后将发布的内容传播给他/她的 4 个圈子内的所有成员。这意味着查询进入循环。它就像一个家谱。逻辑工作得很好。但现在，当每个圈子中的成员数量不断增
批处理动态sql
１. DECLARE TYPE ref_cursor_type IS ref CURSOR; v_mycursor ref_cursor_type;
loops - [批处理]循环直到按下一个键
我想在这里做的是循环直到按下“x”。我知道 CHOICE 带有 /T 选项。但是 CHOICE 对我要播放的动画的超时时间太长。这是一个例子: @echo off cls set frame=2 :
string - 批处理，比较两个文件并将差异写入另一个文件
我已经寻找解决方案，但我仍然遇到问题。我有两个文件: File1.txt 1111 2222 3333 File2.txt 1111 2222 3333 4444 我想要一个只有差异的输
regex - 批处理 - 将变量与正则表达式进行比较
我正在做一个批处理脚本，必须检查计算机上是否安装了一些程序。为此，我执行 programName --version我将输出存储在一个变量中。问题是当我尝试与正则表达式进行比较时(只知道该程序是否存在
function - 批处理 - 如何从批处理文件中返回一个值？
我知道如何从同一个批处理文件中的函数返回值，但我发现从不同的批处理文件返回值时存在一些问题。下面是一个例子: 文件 1.cmd SETLOCAL ENABLEEXTENSIONS SETLOCAL E
arrays - 批处理 - 从数组中删除元素
我相信这个问题的答案应该很简单。我从一个地方获取目录列表并将它们存储到文本文档中。然后我读取文本文档名称并将它们存储到一个数组中。在此过程结束时，我希望删除数组中的所有条目。我想这样做的原因是因为我
windows - 批处理-FTP删除文件夹早于
我家有两个摄像头，几乎每天都在创建图像。他们将它们保存到我的FTP服务器(Fritz.Box\Nas驱动器)。文件夹结构如下: +-2016-08-24 +-+Subfolder +----+Ano
windows - 如何检查输入是否已在列表中-批处理
在Windows Batch中执行此操作。我有一个名称列表，并要求用户输入其名称。我想检查该名称是否已经存在于列表中，如果存在，则直接进入goto，否则它将名称添加到列表中。 @echo off s
windows - 批处理 For 循环以获取第一个值
我正在编写一个批处理文件，我想运行一个 for 循环，将它的第一个值设置为一个变量。我只需要命令的第一个值，但我找不到另一种方法来做到这一点。我设置它的方式是使用一个 for 循环，然后是一个 do
file - 批处理 - 有没有办法批量同步锁定txt文件？
我需要创建一个批处理文件，使用tracert命令跟踪一些IP，并将跟踪写入txt文件。我希望它很快，所以我想为每个跟踪启动一个新命令，以使所有跟踪请求立即启动。这是我的 ping.bat: @ech
powershell - 转义序列问题-批处理
我想在批处理文件中使用PowerShell命令发送电子邮件。为此，我实现了一个名为 sendMail 的函数。我这样称呼它: setlocal enabledelayedexpansion call:
java - 如何使用java执行selenium脚本/批处理
想要使用 java 执行 selenium 脚本/批处理脚本。根据输入参数调用脚本/批处理脚本。了解如何使用 java 代码运行脚本/批处理。请帮帮我。最佳答案要运行 java 项目中文件中包
java - 批处理 JDBC
我正在练习 JDBC 批处理并遇到错误: 错误1:不支持的功能错误2:执行不能为空或为null Property files include: itemsdao.updateBookName = Up
string - 批处理 - 从字符串中删除最后一个字符
我从 json 文件中得到了以下字符串: 39468856, 现在，我想用这些数字进行计算..因此，我必须删除末尾的 , 。此时我使用以下代码: for /f "skip=24 tokens=2"
sql - sql文件的执行时间-批处理
我有一堆 SQL 查询作为文件存储在磁盘上。它们都是纯 SELECT 查询，换句话说，它们只做读操作。我正在连接到 Oracle 11g 数据库，我想测量所有这些查询的大致执行时间。有没有办法以编
java - 批处理 - 从属性文件读取文件路径时出错
我正在使用 java 来存储属性文件的目录路径。然后在 bat 文件中我使用属性作为变量。问题出在 Java 中，文件路径存储为 SOME_VAR=D\:\\Madhan\\Program Fil
string - 批处理 - 用百分比符号替换
我想用“%”替换字符串中的“mod”:set string=%string:mod=x%我应该输入什么作为“x”？最佳答案您可以通过启用延迟扩展来做到这一点，以便您可以使用 !作为分隔符。然后，将
variables - 批处理 - 在另一个变量中回显变量的值？
在我目前正在处理的批处理文件中，我遇到了一个小问题。我有一个名为 Dimensions(number from 1-5, defined in a for /l loop).txt 的文件，其中包含

首页

博学

6Ren·AI

商城

java - Spring 批处理 : Tasklet with multi threaded executor has very bad performances related to Throttling algorithm