gpt4 book ai didi

java - 如何使用 FlatFileItemReader 和异步处理器优化性能

转载 作者:太空宇宙 更新时间:2023-11-04 14:08:34 49 4
gpt4 key购买 nike

我有一个简单的 csv 文件,约有 400,000 行(仅一列)我花了很多时间来阅读记录并处理它们

处理器根据 couchbase 验证记录

作者 - 写入远程主题我花了大约30分钟。那太疯狂了。

我读到 flatfileItemreader 不是线程安全的。所以我的 block 值为 1。

我读到异步处理可以提供帮助。但我看不到任何改进。

这是我的代码:

@Configuration
@EnableBatchProcessing
public class NotificationFileProcessUploadedFileJob {


@Value("${expected.snid.header}")
public String snidHeader;

@Value("${num.of.processing.chunks.per.file}")
public int numOfProcessingChunksPerFile;

@Autowired
private InfrastructureConfigurationConfig infrastructureConfigurationConfig;

private static final String OVERRIDDEN_BY_EXPRESSION = null;


@Inject
private JobBuilderFactory jobs;

@Inject
private StepBuilderFactory stepBuilderFactory;

@Inject
ExecutionContextPromotionListener executionContextPromotionListener;


@Bean
public Job processUploadedFileJob() throws Exception {
return this.jobs.get("processUploadedFileJob").start((processSnidUploadedFileStep())).build();

}

@Bean
public Step processSnidUploadedFileStep() {
return stepBuilderFactory.get("processSnidFileStep")
.<PushItemDTO, PushItemDTO>chunk(numOfProcessingChunksPerFile)
.reader(snidFileReader(OVERRIDDEN_BY_EXPRESSION))
.processor(asyncItemProcessor())
.writer(asyncItemWriter())
// .throttleLimit(20)
// .taskJobExecutor(infrastructureConfigurationConfig.taskJobExecutor())


// .faultTolerant()
// .skipLimit(10) //default is set to 0
// .skip(MySQLIntegrityConstraintViolationException.class)
.build();
}

@Inject
ItemWriter writer;

@Bean
public AsyncItemWriter asyncItemWriter() {
AsyncItemWriter asyncItemWriter=new AsyncItemWriter();
asyncItemWriter.setDelegate(writer);
return asyncItemWriter;
}


@Bean
@Scope(value = "step", proxyMode = ScopedProxyMode.INTERFACES)
public ItemStreamReader<PushItemDTO> snidFileReader(@Value("#{jobParameters[filePath]}") String filePath) {
FlatFileItemReader<PushItemDTO> itemReader = new FlatFileItemReader<PushItemDTO>();
itemReader.setLineMapper(snidLineMapper());
itemReader.setLinesToSkip(1);
itemReader.setResource(new FileSystemResource(filePath));
return itemReader;
}


@Bean
public AsyncItemProcessor asyncItemProcessor() {

AsyncItemProcessor<PushItemDTO, PushItemDTO> asyncItemProcessor = new AsyncItemProcessor();

asyncItemProcessor.setDelegate(processor(OVERRIDDEN_BY_EXPRESSION, OVERRIDDEN_BY_EXPRESSION, OVERRIDDEN_BY_EXPRESSION,
OVERRIDDEN_BY_EXPRESSION, OVERRIDDEN_BY_EXPRESSION, OVERRIDDEN_BY_EXPRESSION, OVERRIDDEN_BY_EXPRESSION));
asyncItemProcessor.setTaskExecutor(infrastructureConfigurationConfig.taskProcessingExecutor());

return asyncItemProcessor;

}

@Scope(value = "step", proxyMode = ScopedProxyMode.INTERFACES)
@Bean
public ItemProcessor<PushItemDTO, PushItemDTO> processor(@Value("#{jobParameters[pushMessage]}") String pushMessage,
@Value("#{jobParameters[jobId]}") String jobId,
@Value("#{jobParameters[taskId]}") String taskId,
@Value("#{jobParameters[refId]}") String refId,
@Value("#{jobParameters[url]}") String url,
@Value("#{jobParameters[targetType]}") String targetType,
@Value("#{jobParameters[gameType]}") String gameType) {
return new PushItemProcessor(pushMessage, jobId, taskId, refId, url, targetType, gameType);
}

@Bean
public LineMapper<PushItemDTO> snidLineMapper() {
DefaultLineMapper<PushItemDTO> lineMapper = new DefaultLineMapper<PushItemDTO>();
DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
lineTokenizer.setDelimiter(",");
lineTokenizer.setStrict(true);
lineTokenizer.setStrict(true);
String[] splittedHeader = snidHeader.split(",");
lineTokenizer.setNames(splittedHeader);
BeanWrapperFieldSetMapper<PushItemDTO> fieldSetMapper = new BeanWrapperFieldSetMapper<PushItemDTO>();
fieldSetMapper.setTargetType(PushItemDTO.class);

lineMapper.setLineTokenizer(lineTokenizer);
lineMapper.setFieldSetMapper(new PushItemFieldSetMapper());
return lineMapper;
}
}


@Bean
@Override
public SimpleAsyncTaskExecutor taskProcessingExecutor() {
SimpleAsyncTaskExecutor simpleAsyncTaskExecutor = new SimpleAsyncTaskExecutor();
simpleAsyncTaskExecutor.setConcurrencyLimit(300);
return simpleAsyncTaskExecutor;
}

您认为我可以如何提高处理性能并使其更快?谢谢

ItemWriter 代码:

 @Bean
public ItemWriter writer() {
return new KafkaWriter();
}


public class KafkaWriter implements ItemWriter<PushItemDTO> {


private static final Logger logger = LoggerFactory.getLogger(KafkaWriter.class);

@Autowired
KafkaProducer kafkaProducer;

@Override
public void write(List<? extends PushItemDTO> items) throws Exception {

for (PushItemDTO item : items) {
try {
logger.debug("Writing to kafka=" + item);
sendMessageToKafka(item);
} catch (Exception e) {
logger.error("Error writing item=" + item.toString(), e);
}
}
}

最佳答案

增加您的提交计数是我要开始的地方。请记住提交计数的含义。由于您将其设置为 1,因此您将对每个项目执行以下操作:

  1. 开始交易
  2. 阅读文章
  3. 处理该项目
  4. 写下该项目
  5. 更新作业存储库
  6. 提交交易

您的配置没有显示委托(delegate) ItemWriter 是什么,所以我无法判断,但至少您每个项目执行多个 SQL 语句来更新作业存储库。

您是正确的,FlatFileItemReader 不是线程安全的。但是,您没有使用多个线程来读取,而只是进行处理,因此据我所知,没有理由将提交计数设置为 1。

关于java - 如何使用 FlatFileItemReader 和异步处理器优化性能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28606634/

49 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com