gpt4 book ai didi

java - 使用有限堆的 azure-sdk-for-java 上传大文件

转载 作者:行者123 更新时间:2023-12-03 17:09:50 25 4
gpt4 key购买 nike

我们正在开发文档微服务,需要使用Azure作为文件内容的存储。 Azure Block Blob 似乎是一个合理的选择。文档服务的堆限制为 512MB ( -Xmx512m )。

我没有成功使用 azure-storage-blob:12.10.0-beta.1 使用有限的堆进行流式文件上传。 (也在 12.9.0 上进行了测试)。

尝试了以下方法:

  1. documentation 复制粘贴使用BlockBlobClient
BlockBlobClient blockBlobClient = blobContainerClient.getBlobClient("file").getBlockBlobClient();

File file = new File("file");

try (InputStream dataStream = new FileInputStream(file)) {
blockBlobClient.upload(dataStream, file.length(), true /* overwrite file */);
}

结果: java.io.IOException: mark/reset not supported - SDK 尝试使用标记/重置,即使文件输入流报告不支持此功能。

  • 添加BufferedInputStream缓解标记/重置问题(根据 advice ):
  • BlockBlobClient blockBlobClient = blobContainerClient.getBlobClient("file").getBlockBlobClient();

    File file = new File("file");

    try (InputStream dataStream = new BufferedInputStream(new FileInputStream(file))) {
    blockBlobClient.upload(dataStream, file.length(), true /* overwrite file */);
    }

    结果: java.lang.OutOfMemoryError: Java heap space 。我假设 SDK 尝试将所有 1.17GB 的文件内容加载到内存中。

  • 替换 BlockBlobClientBlobClient并删除堆大小限制( -Xmx512m ):
  • BlobClient blobClient = blobContainerClient.getBlobClient("file");

    File file = new File("file");

    try (InputStream dataStream = new FileInputStream(file)) {
    blobClient.upload(dataStream, file.length(), true /* overwrite file */);
    }

    结果:使用了 1.5GB 堆内存,所有文件内容都加载到内存中 + Reactor 侧进行一些缓冲

    Heap usage from VisualVM

  • 通过 BlobOutputStream 切换到流式传输:
  • long blockSize = DataSize.ofMegabytes(4L).toBytes();

    BlockBlobClient blockBlobClient = blobContainerClient.getBlobClient("file").getBlockBlobClient();

    // create / erase blob
    blockBlobClient.commitBlockList(List.of(), true);

    BlockBlobOutputStreamOptions options = (new BlockBlobOutputStreamOptions()).setParallelTransferOptions(
    (new ParallelTransferOptions()).setBlockSizeLong(blockSize).setMaxConcurrency(1).setMaxSingleUploadSizeLong(blockSize));

    try (InputStream is = new FileInputStream("file")) {
    try (OutputStream os = blockBlobClient.getBlobOutputStream(options)) {
    IOUtils.copy(is, os); // uses 8KB buffer
    }
    }

    结果:文件在上传过程中损坏。 Azure Web 门户显示 1.09GB,而不是预期的 1.17GB。从 Azure Web 门户手动下载文件确认文件内容在上传过程中已损坏。内存占用显着减少,但文件损坏却是一个大问题。

    问题:无法提供内存占用较小的有效上传/下载解决方案

    任何帮助将不胜感激!

    最佳答案

    请尝试使用下面的代码来上传/下载大文件,我已经使用大小约为 1.1 GB 的 .zip 文件进行了测试

    上传文件:

    public static void uploadFilesByChunk() {
    String connString = "<conn str>";
    String containerName = "<container name>";
    String blobName = "UploadOne.zip";
    String filePath = "D:/temp/" + blobName;

    BlobServiceClient client = new BlobServiceClientBuilder().connectionString(connString).buildClient();
    BlobClient blobClient = client.getBlobContainerClient(containerName).getBlobClient(blobName);
    long blockSize = 2 * 1024 * 1024; //2MB
    ParallelTransferOptions parallelTransferOptions = new ParallelTransferOptions()
    .setBlockSizeLong(blockSize).setMaxConcurrency(2)
    .setProgressReceiver(new ProgressReceiver() {
    @Override
    public void reportProgress(long bytesTransferred) {
    System.out.println("uploaded:" + bytesTransferred);
    }
    });

    BlobHttpHeaders headers = new BlobHttpHeaders().setContentLanguage("en-US").setContentType("binary");

    blobClient.uploadFromFile(filePath, parallelTransferOptions, headers, null, AccessTier.HOT,
    new BlobRequestConditions(), Duration.ofMinutes(30));
    }

    内存占用: enter image description here

    下载文件:

    public static void downLoadFilesByChunk() {
    String connString = "<conn str>";
    String containerName = "<container name>";
    String blobName = "UploadOne.zip";

    String filePath = "D:/temp/" + "DownloadOne.zip";

    BlobServiceClient client = new BlobServiceClientBuilder().connectionString(connString).buildClient();
    BlobClient blobClient = client.getBlobContainerClient(containerName).getBlobClient(blobName);
    long blockSize = 2 * 1024 * 1024;
    com.azure.storage.common.ParallelTransferOptions parallelTransferOptions = new com.azure.storage.common.ParallelTransferOptions()
    .setBlockSizeLong(blockSize).setMaxConcurrency(2)
    .setProgressReceiver(new com.azure.storage.common.ProgressReceiver() {
    @Override
    public void reportProgress(long bytesTransferred) {
    System.out.println("dowloaded:" + bytesTransferred);
    }
    });

    BlobDownloadToFileOptions options = new BlobDownloadToFileOptions(filePath)
    .setParallelTransferOptions(parallelTransferOptions);
    blobClient.downloadToFileWithResponse(options, Duration.ofMinutes(30), null);
    }

    内存占用: enter image description here

    结果: enter image description here

    关于java - 使用有限堆的 azure-sdk-for-java 上传大文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65395726/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com