java - 使用有限堆的 azure-sdk-for-java 上传大文件-6ren

java - 使用有限堆的 azure-sdk-for-java 上传大文件

转载作者：行者123 更新时间：2023-12-03 17:09:50

我们正在开发文档微服务，需要使用Azure作为文件内容的存储。 Azure Block Blob 似乎是一个合理的选择。文档服务的堆限制为 512MB ( -Xmx512m )。

我没有成功使用 azure-storage-blob:12.10.0-beta.1 使用有限的堆进行流式文件上传。 (也在 12.9.0 上进行了测试)。

尝试了以下方法:

从 documentation 复制粘贴使用BlockBlobClient

BlockBlobClient blockBlobClient = blobContainerClient.getBlobClient("file").getBlockBlobClient();

File file = new File("file");

try (InputStream dataStream = new FileInputStream(file)) {
  blockBlobClient.upload(dataStream, file.length(), true /* overwrite file */);
}

结果: java.io.IOException: mark/reset not supported - SDK 尝试使用标记/重置，即使文件输入流报告不支持此功能。

添加BufferedInputStream缓解标记/重置问题(根据 advice ):

BlockBlobClient blockBlobClient = blobContainerClient.getBlobClient("file").getBlockBlobClient();

File file = new File("file");

try (InputStream dataStream = new BufferedInputStream(new FileInputStream(file))) {
  blockBlobClient.upload(dataStream, file.length(), true /* overwrite file */);
}

结果: java.lang.OutOfMemoryError: Java heap space 。我假设 SDK 尝试将所有 1.17GB 的文件内容加载到内存中。

替换 BlockBlobClient与 BlobClient并删除堆大小限制( -Xmx512m ):

BlobClient blobClient = blobContainerClient.getBlobClient("file");

File file = new File("file");

try (InputStream dataStream = new FileInputStream(file)) {
  blobClient.upload(dataStream, file.length(), true /* overwrite file */);
}

结果:使用了 1.5GB 堆内存，所有文件内容都加载到内存中 + Reactor 侧进行一些缓冲

Heap usage from VisualVM

通过 BlobOutputStream 切换到流式传输:

long blockSize = DataSize.ofMegabytes(4L).toBytes();

BlockBlobClient blockBlobClient = blobContainerClient.getBlobClient("file").getBlockBlobClient();

// create / erase blob
blockBlobClient.commitBlockList(List.of(), true);

BlockBlobOutputStreamOptions options = (new BlockBlobOutputStreamOptions()).setParallelTransferOptions(
  (new ParallelTransferOptions()).setBlockSizeLong(blockSize).setMaxConcurrency(1).setMaxSingleUploadSizeLong(blockSize));

try (InputStream is = new FileInputStream("file")) {
  try (OutputStream os = blockBlobClient.getBlobOutputStream(options)) {
    IOUtils.copy(is, os); // uses 8KB buffer
  }
}

结果:文件在上传过程中损坏。 Azure Web 门户显示 1.09GB，而不是预期的 1.17GB。从 Azure Web 门户手动下载文件确认文件内容在上传过程中已损坏。内存占用显着减少，但文件损坏却是一个大问题。

问题:无法提供内存占用较小的有效上传/下载解决方案

任何帮助将不胜感激!

最佳答案

请尝试使用下面的代码来上传/下载大文件，我已经使用大小约为 1.1 GB 的 .zip 文件进行了测试

上传文件:

public static void uploadFilesByChunk() {
                String connString = "<conn str>";
                String containerName = "<container name>";
                String blobName = "UploadOne.zip";
                String filePath = "D:/temp/" + blobName;

                BlobServiceClient client = new BlobServiceClientBuilder().connectionString(connString).buildClient();
                BlobClient blobClient = client.getBlobContainerClient(containerName).getBlobClient(blobName);
                long blockSize = 2 * 1024 * 1024; //2MB
                ParallelTransferOptions parallelTransferOptions = new ParallelTransferOptions()
                                .setBlockSizeLong(blockSize).setMaxConcurrency(2)
                                .setProgressReceiver(new ProgressReceiver() {
                                        @Override
                                        public void reportProgress(long bytesTransferred) {
                                                System.out.println("uploaded:" + bytesTransferred);
                                        }
                                });

                BlobHttpHeaders headers = new BlobHttpHeaders().setContentLanguage("en-US").setContentType("binary");

                blobClient.uploadFromFile(filePath, parallelTransferOptions, headers, null, AccessTier.HOT,
                                new BlobRequestConditions(), Duration.ofMinutes(30));
        }

内存占用:

下载文件:

public static void downLoadFilesByChunk() {
                String connString = "<conn str>";
                String containerName = "<container name>";
                String blobName = "UploadOne.zip";

                String filePath = "D:/temp/" + "DownloadOne.zip";

                BlobServiceClient client = new BlobServiceClientBuilder().connectionString(connString).buildClient();
                BlobClient blobClient = client.getBlobContainerClient(containerName).getBlobClient(blobName);
                long blockSize = 2 * 1024 * 1024;
                com.azure.storage.common.ParallelTransferOptions parallelTransferOptions = new com.azure.storage.common.ParallelTransferOptions()
                                .setBlockSizeLong(blockSize).setMaxConcurrency(2)
                                .setProgressReceiver(new com.azure.storage.common.ProgressReceiver() {
                                        @Override
                                        public void reportProgress(long bytesTransferred) {
                                                System.out.println("dowloaded:" + bytesTransferred);
                                        }
                                });

                BlobDownloadToFileOptions options = new BlobDownloadToFileOptions(filePath)
                                .setParallelTransferOptions(parallelTransferOptions);
                blobClient.downloadToFileWithResponse(options, Duration.ofMinutes(30), null);
        }

内存占用:

结果:

关于java - 使用有限堆的 azure-sdk-for-java 上传大文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/65395726/

文章推荐： ml5 - 继续训练从训练和保存的模型加载的模型

文章推荐： optimization - 优化工作调度MiniZinc代码——约束规划

文章推荐： cocoa - NSWindow标题栏背景和文本颜色

java - 有限 for 循环迭代的复杂性
我有一个关于复杂性的简单问题。我在 Java 中有这段代码: pairs是 HashMap包含 Integer作为键，它的频率为 Collection作为一个值。所以: pairs = new Has
dictionary - 有限 map 示例
对于我的应用程序，我需要在 Coq 中使用和推理有限映射。谷歌搜索我发现 FMapAVL 似乎非常适合我的需求。问题是文档很少，我还没有弄清楚我应该如何使用它。作为一个简单的例子，考虑以下使用对列表
mysql - 从关联表的单个(有限)行返回多列
我有一个主表tblAssetMaster A和一个移动表tblMovement M。我想提取所有 Assets 及其当前位置，因此需要获取每个 Assets 的最新移动条目。字段 A: Asset
html - 有限 margin 汽车？
我想让我的网站内容居中，但仅限于网页的特定宽度。所以当它超过 500px 时，我希望内容被修复，无法进一步拉伸(stretch)。无论如何都要这样做，还是我最好把所有东西都修好？希望有意义的是添加一些
javascript - 主干模型.destroy() 有限
我正在尝试批量删除 Backbone 模型的集合，如下所示...... collection.each(function(model, i){ model.destroy(); }); 我发现当每
performance - 模拟处理器的(有限)资源，包括时钟速度
我想要一个软件环境，在其中我可以在具有特定资源的硬件上测试我的软件的速度。例如，当我的主机硬件是具有 12GB RAM 的 3GHz 四核 amd64 时，该程序在具有 24 Mb RAM 的 800
java - Eclipse 中的 BigInteger 有限
在 Eclipse 中，我得到了 BigInteger.valueOf(2).pow(31093) 的值，但没有得到 BigInteger.valueOf(2).pow(31094) 的值(它是空的)
algorithm - 当循环中出现异常时，在继续下一次迭代之前尝试该操作 3(有限)次的最佳方法是什么？
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。要求提供代码的问题必须表现出对所解决问题的最低限度理解。包括尝试过的解决方案、为什么它们不起作用，以及预
php - HTML 和 PHP 的帖子值(value)有限
我想将 2 个表从本地 sql server 2000 上传到托管的 mysql。第一个表有 17 列和 680 行，其他 10 列和 8071 行。我首先使用 xampp mysql 尝试离线，它
javascript - Html:是否可以使用 Javascript 实现静态 html 的无限(有限？)滚动？
我在 S3 中自动生成并保存了静态 html 文件。有时文件大小达到 2mb。是否可以使用javascript来获取html文件的一部分，显示它，当用户到达页面底部时，获取下一部分等等？最佳答案 X

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

java - 使用有限堆的 azure-sdk-for-java 上传大文件