gpt4 book ai didi

java - Spring Batch 中的多线程步骤和本地分区有什么区别?

转载 作者:搜寻专家 更新时间:2023-11-01 03:30:48 26 4
gpt4 key购买 nike

我有以下 doc .

还有提到:

1.1. Multi-threaded Step The simplest way to start parallel processing is to add a TaskExecutor to your Step configuration.

When using java configuration, a TaskExecutor can be added to the step as shown in the following example:

@Bean
public TaskExecutor taskExecutor(){
return new SimpleAsyncTaskExecutor("spring_batch");
}

@Bean
public Step sampleStep(TaskExecutor taskExecutor) {
return this.stepBuilderFactory.get("sampleStep")
.<String, String>chunk(10)
.reader(itemReader())
.writer(itemWriter())
.taskExecutor(taskExecutor)
.build();
}

The result of the above configuration is that the Step executes by reading, processing, and writing each chunk of items (each commit interval) in a separate thread of execution. Note that this means there is no fixed order for the items to be processed, and a chunk might contain items that are non-consecutive compared to the single-threaded case. In addition to any limits placed by the task executor (such as whether it is backed by a thread pool), there is a throttle limit in the tasklet configuration which defaults to 4. You may need to increase this to ensure that a thread pool is fully utilized.

但之前我认为它应该通过本地分区来实现,我应该提供一个分区器来说明如何将数据分成几 block 。多线程 Step 应该自动执行。

问题

您能解释一下它是如何工作的吗?除了线程号外,我该如何管理它?它适用于平面文件吗?

附言

我创建了示例:

@Configuration
public class MultithreadedStepConfig {

@Autowired
public JobBuilderFactory jobBuilderFactory;

@Autowired
public StepBuilderFactory stepBuilderFactory;
@Autowired
private ToLowerCasePersonProcessor toLowerCasePersonProcessor;

@Autowired
private DbPersonWriter dbPersonWriter;

@Value("${app.single-file}")
Resource resources;

@Bean
public Job job(Step databaseToDataBaseLowercaseSlaveStep) {
return jobBuilderFactory.get("myMultiThreadedJob")
.incrementer(new RunIdIncrementer())
.flow(csvToDataBaseSlaveStep())
.end()
.build();
}

private Step csvToDataBaseSlaveStep() {
return stepBuilderFactory.get("csvToDatabaseStep")
.<Person, Person>chunk(50)
.reader(csvPersonReaderMulti())
.processor(toLowerCasePersonProcessor)
.writer(dbPersonWriter)
.taskExecutor(jobTaskExecutorMultiThreaded())
.build();

}

@Bean
@StepScope
public FlatFileItemReader csvPersonReaderMulti() {
return new FlatFileItemReaderBuilder()
.name("csvPersonReaderSplitted")
.resource(resources)
.delimited()
.names(new String[]{"firstName", "lastName"})
.fieldSetMapper(new BeanWrapperFieldSetMapper<Person>() {{
setTargetType(Person.class);
}})
.saveState(false)
.build();

}

@Bean
public TaskExecutor jobTaskExecutorMultiThreaded() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
// there are 21 sites currently hence we have 21 threads
taskExecutor.setMaxPoolSize(30);
taskExecutor.setCorePoolSize(25);
taskExecutor.setThreadGroupName("multi-");
taskExecutor.setThreadNamePrefix("multi-");
taskExecutor.afterPropertiesSet();
return taskExecutor;
}
}

根据日志,它确实有效,但我想知道详细信息。比自己写的partitioner好吗?

最佳答案

当您使用多线程步骤和分区时,这里基本上存在根本差异。

多线程步骤是单进程,因此如果您有处理器/写入器的持久化状态,那么使用它不是一个好主意。但是,如果您只是生成报告而不保存任何内容,这是一个不错的选择。

正如您提到的,您想要处理一个平面文件并说您想要将记录存储在数据库中,那么假设您的读者不重,您可以使用远程分块概念。

Partitioner 将为您可以使用逻辑划分的每组数据创建单独的进程。

希望这对您有所帮助。

关于java - Spring Batch 中的多线程步骤和本地分区有什么区别?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57471959/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com