Java:如何从Chunk中的ArrayList中的目录中加载所有文件并对其进行处理-6ren

Java:如何从Chunk中的ArrayList中的目录中加载所有文件并对其进行处理

转载作者：行者123 更新时间：2023-12-02 09:57:48

我有一个应用程序，我需要从目录加载所有文件并上传到 S3 。目录中的文件数量约为 1 亿个大小为 15 GB 的小 xml 文件。

这就是我当前上传和处理它的方式。但是，当文件较少时，它可以正常工作，但是当文件较多时，我也会出现内存不足错误，并且它不起作用。

public class FileProcessThreads {

    private static Logger _logger = Logger.getLogger(FileProcessThreads.class);

    public  ArrayList process(String fileLocation)  {

        _logger.info("Calling process method of FileProcessThreads class");
        File dir = new File(fileLocation);
        File[] directoryListing = dir.listFiles();
        ArrayList<File> files = new ArrayList<File>();
        if (directoryListing.length > 0) {

            for (File path : directoryListing) {
                String fileType =  FilenameUtils.getExtension (path.getName());
                long fileSize = path.length();
                if (fileType.equals("gz") && fileSize>0){
                    files.add(path);
                }

            }
        }
        _logger.info("Exiting  process method of FileProcessThreads class");
        return files;
    }

我认为在chuck中加载文件可能会起作用。但我该怎么办呢？此外，目录中的文件始终位于同一目录中。

我们可以增加数组的大小吗？

我也在这里调用这个类

public class UploadExecutor {
    private static Logger _logger = Logger.getLogger(UploadExecutor.class);

    @SuppressWarnings("unchecked")
    public static void main(String[] args) {

        _logger.info("----------STARTING JAVA MAIN METHOD----------------- ");

        /*
         * 3 C:\\Users\\u6034690\\Desktop\\TWOFILE\\xml
         * a205381-tr-fr-production-us-east-1-trf-auditabilty
         */
        while (true) {

            String strNoOfThreads = args[0];
            String strFileLocation = args[1];
            String strBucketName = args[2];

            int iNoOfThreads = Integer.parseInt(strNoOfThreads);
            S3ClientManager s3ClientObj = new S3ClientManager();
            AmazonS3Client s3Client = s3ClientObj.buildS3Client();


            try {

                FileProcessThreads fp = new FileProcessThreads();
                List<File> records = fp.process(strFileLocation);
                try {

                    _logger.info("No records found will wait for 10 Seconds");
                    TimeUnit.SECONDS.sleep(10);
                    records = fp.process(strFileLocation);

                } catch (InterruptedException e) {
                    _logger.error("InterruptedException: " + e.toString());
                }
                _logger.info("Total no of Audit files = " + records.size());

                if (records.size() >= 0) {
                    BuildThread BuildThreadObj = new BuildThread();
                    BuildThreadObj.buildThreadLogic(iNoOfThreads, s3Client, records, strFileLocation, strBucketName);
                }
            } catch (Throwable t) {
                _logger.error("InterruptedException: " + t.toString());
            }
        }
    }
}

感谢任何帮助。

我无法使用下面的代码，因为我需要将其作为文件列表上传到 S3 中。

Iterator<File> it = FileUtils.iterateFiles(folder, null, true);
  while (it.hasNext())
  {
     File fileEntry = (File) it.next();
  }

最佳答案

您可以从 process 方法返回 File[] 而不是 ArrayList。然后在主类中，迭代文件并批量上传。

   public class FileProcessThreads {

        private static Logger _logger = Logger.getLogger(FileProcessThreads.class);

        public  File[] getFiles(String fileLocation)  {

            _logger.info("Calling process method of FileProcessThreads class");
            File dir = new File(fileLocation);
            File[] directoryListing = dir.listFiles();
            ArrayList<File> files = new ArrayList<File>();
            if (directoryListing.length > 0)
                return directoryListing;
            _logger.info("Exiting  process method of FileProcessThreads class");
           return null;

        }
    }   

public class UploadExecutor {
    private static Logger _logger = Logger.getLogger(UploadExecutor.class);

    @SuppressWarnings("unchecked")
    public static void main(String[] args) {

        _logger.info("----------STARTING JAVA MAIN METHOD----------------- ");

        /*
         * 3 C:\\Users\\u6034690\\Desktop\\TWOFILE\\xml
         * a205381-tr-fr-production-us-east-1-trf-auditabilty
         */
        while (true) {

            String strNoOfThreads = args[0];
            String strFileLocation = args[1];
            String strBucketName = args[2];

            int iNoOfThreads = Integer.parseInt(strNoOfThreads);
            S3ClientManager s3ClientObj = new S3ClientManager();
            AmazonS3Client s3Client = s3ClientObj.buildS3Client();


            try {

                FileProcessThreads fp = new FileProcessThreads();
                File[] files = fp.getFiles(strFileLocation);
                try {

                    _logger.info("No records found will wait for 10 Seconds");
                    TimeUnit.SECONDS.sleep(10);
                    files = fp.getFiles(strFileLocation);
                    ArrayList<File> batchFiles = new ArrayList<File>(batchSize);
                    if(null!=files){
                        for (File path : files) {
                            String fileType =  FilenameUtils.getExtension (path.getName());
                            long fileSize = path.length();
                            if (fileType.equals("gz") && fileSize>0){
                                batchFiles.add(path);
                            }
                            //wait till list size equals to batchSize
                            if (batchFiles.size() == batchSize) {
                                //upload batch to S3
                                BuildThread BuildThreadObj = new BuildThread();
                                BuildThreadObj.buildThreadLogic(iNoOfThreads, s3Client, batchFiles, strFileLocation, strBucketName);
                                batchFiles.clear();
                            }
                        }
                    }
                    //to consider remaining or files with count<batch size
                    if (! batch.isEmpty()) {
                        BuildThread BuildThreadObj = new BuildThread();
                        BuildThreadObj.buildThreadLogic(iNoOfThreads, s3Client, batchFiles, strFileLocation, strBucketName);
                        batchFiles.clear();
                    }

                } catch (InterruptedException e) {
                    _logger.error("InterruptedException: " + e.toString());
                }
                _logger.info("Total no of Audit files = " + records.size());

            } catch (Throwable t) {
                _logger.error("InterruptedException: " + t.toString());
            }
        }
    }
}

希望对你有帮助

关于Java:如何从Chunk中的ArrayList中的目录中加载所有文件并对其进行处理，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/55863311/

文章推荐： java - 如何在 dto 属性上使用 SafeHtml 注释？

文章推荐： java - 调试 `Widget is disposed`异常

angularjs - 抛出新的 ERR_INVALID_ARG_TYPE ('chunk' , ['string' ,'Buffer' ],chunk);TypeError[ERR_INVALID_ARG_TYPE] :The "chunk" arg must be type string or Buffer
我正在尝试使用 Node js 服务将 .json 文件的内容获取到 angularjs 方法中。但是我得到以下错误: _http_outgoing.js:700 throw new ERR_INVA
javascript - Chunk.entrypoints : Use Chunks. addGroup 改为
我使用的是 npm 版本 6.0.1。我的操作系统是 macOS High Sierra 版本 10.13.3 我想构建我的项目，但收到此错误消息: Creating an optimized pro
python - 值错误 ("Incomplete wav chunk.") 值错误 : Incomplete wav chunk
import numpy as np from matplotlib import pyplot as plt import scipy.io.wavfile as wav from numpy.li
c++ - 'Chunk * _chunk = new (size) Chunk(size)' 是什么意思？
下面是一段c++代码: Chunk * _chunk = new (size) Chunk(size); 我不明白'new'后面的第一个'(size)'，它是什么意思？以上代码来自JDK8。最佳答案
asp.net - 如何解决 "Chunked body did not terminate properly with 0-sized chunk."？
我有一个 RSS 提要。当我在打开 Fiddler Web Debugger 的情况下浏览提要时，Fiddler 向我抛出此错误: Chunked body did not terminate pro
javascript - 错误 : Chunk. 入口点 : Use Chunks. groupsIterable 并按 instanceof 入口点过滤
当我运行脚本 npm run watch 时出现此错误 cross-env NODE_ENV=development node_modules/webpack/bin/webpack.js --wat
javascript - Chunk.entrypoints : Use Chunks. groupsIterable 并通过 instanceof Entrypoint 过滤
我在尝试启动我的应用程序时看到以下错误... > css-modules@1.0.0 start /Users/johnnynolan/Repos/css-modules webpack && ope
c# - Web Api (MVC 6) Chunked body 没有正确终止 0-sized chunk
我正在使用 MVC 6 rc1 和 EF 7 rc 1 Code First 模型通过 Web API Controller 检索数据。我有 3 个类似于下面的表。 class Product {
Python( Pandas ): How to read data by chunks in a long file where chunks are separated by a header and are not equal in length?
我在一个 txt 文件中有以下数据: 00001+++00001 000031 12.8600 -1 7 BEAR 1990052418 276.0
apache - org.apache.http.ConnectionClosedException : Premature end of chunk coded message body: closing chunk expected
我正在试用 RestAssured 并编写了以下语句 - String URL = "http://XXXXXXXX"; Response result = given().
RMarkdown Chunk - 不要回显最后一个表达式
我在 RMarkdown 文档中有一个块，如下所示: ```{r, echo=-4} a <- 1 b <- 2 x <- a + b print(paste(c("`x` is equal to "
Rust 迭代器类似于 chunks()
这个迭代器 let data = vec![0, 1, 2, 3, 4, 5]; for x in data.chunks(2) { println!("{:?}", x); } 会产生 [0
PHP chunk() 替代方案
flock() 的 PHP 文档页面表明在IIS下使用不安全。如果我不能在所有情况下都依赖 flock，是否有其他方法可以安全地实现同样的目标？最佳答案在所有想象的可能情况下，没有其他方法可以安全
java - 使用传输编码改造客户端和响应 : chunked
我正在开发一个 Android 示例应用程序，它从 http://www.omdbapi.com/ 获取电影列表. REST 服务是: http://www.omdbapi.com/?s=star&a
c# - "Chunked"内存流
我正在寻找 MemoryStream 的实现，它不会将内存分配为一个大块，而是一组 block 。我想在内存(64 位)中存储几 GB 的数据，并避免内存碎片的限制。最佳答案像这样: class
javascript - 如何处理在服务器上更新的 webpack chunk？
我们有一个 React 应用程序，它使用 React.lazy 和 Suspend 进行代码拆分。每个星期二我们都会部署一个新版本，因此我们的 block 也会发生变化。我们现在遇到的问题是，如果我
php - Eloquent chunk()遗漏了一半的结果
我对Laravel的ORM Eloquent chunk()方法有疑问。它错过了一些结果。这是一个测试查询: $destinataires = Destinataire::where('statu
c# iTextSharp 合并两个 'Chunk'
我需要知道是否可以将 iTextSharp 中的两个 Chunk 组合起来 Phrase phrase = new Phrase(); var text1 = new Chunk("hello");
ajax - 如何在客户端编写javascript来及时接收和解析 `chunked`响应？
我正在使用播放框架来生成分块响应。代码是: class Test extends Controller { public static void chunk() throws Interrup
javascript - JavaScript : chunk method
这个问题已经有答案了: Split array into chunks (80 个回答) 已关闭 5 年前。我正在尝试在 JavaScript 中实现一个类似于 lodash chunk 的 blo

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

Java:如何从Chunk中的ArrayList中的目录中加载所有文件并对其进行处理