gpt4 book ai didi

file - 将近 40 万张图像传输到 S3 的最有效方法

转载 作者:行者123 更新时间:2023-12-05 00:04:48 25 4
gpt4 key购买 nike

我目前负责将一个站点从其当前服务器转移到 EC2,该项目的一部分已经完成并且很好,另一部分是我正在努力的部分,该站点目前有近 400K 图像,都在不同的文件夹中排序在主 userimg 文件夹中,客户端希望所有这些图像都存储在 S3 上 - 我遇到的主要问题是如何将近 400,000 张图像从服务器传输到 S3 - 我一直在使用 http://s3tools.org/s3cmd这很棒,但是如果我要使用 s3cmd 传输 userimg 文件夹,它将需要近 3 天的时间,如果连接中断或出现类似问题,我将在 s3 上有一些图像,有些则没有,无法继续过程……

任何人都可以提出解决方案,以前有没有人遇到过这样的问题?

最佳答案

我建议您编写(或让某人编写)一个简单的 Java 实用程序,它:

  • 读取客户端目录的结构(如果需要)
  • 对于每个图像,在 s3 上创建相应的 key (根据 1 中读取的文件结构)并使用 AWS SDK 或 jets3t API 并行启动分段上传。

  • 我为我们的客户做的。不到200行java代码,非常可靠。
    下面是进行分段上传的部分。读取文件结构的部分是微不足道的。

    /**
    * Uploads file to Amazon S3. Creates the specified bucket if it does not exist.
    * The upload is done in chunks of CHUNK_SIZE size (multi-part upload).
    * Attempts to handle upload exceptions gracefully up to MAX_RETRY times per single chunk.
    *
    * @param accessKey - Amazon account access key
    * @param secretKey - Amazon account secret key
    * @param directoryName - directory path where the file resides
    * @param keyName - the name of the file to upload
    * @param bucketName - the name of the bucket to upload to
    * @throws Exception - in case that something goes wrong
    */
    public void uploadFileToS3(String accessKey
    ,String secretKey
    ,String directoryName
    ,String keyName // that is the file name that will be created after upload completed
    ,String bucketName ) throws Exception {

    // Create a credentials object and service to access S3 account
    AWSCredentials myCredentials =
    new BasicAWSCredentials(accessKey, secretKey);

    String filePath = directoryName
    + System.getProperty("file.separator")
    + keyName;

    log.info("uploadFileToS3 is about to upload file [" + filePath + "]");

    AmazonS3 s3Client = new AmazonS3Client(myCredentials);
    // Create a list of UploadPartResponse objects. You get one of these
    // for each part upload.
    List<PartETag> partETags = new ArrayList<PartETag>();

    // make sure that the bucket exists
    createBucketIfNotExists(bucketName, accessKey, secretKey);

    // delete the file from bucket if it already exists there
    s3Client.deleteObject(bucketName, keyName);

    // Initialize.
    InitiateMultipartUploadRequest initRequest = new InitiateMultipartUploadRequest(bucketName, keyName);
    InitiateMultipartUploadResult initResponse = s3Client.initiateMultipartUpload(initRequest);

    File file = new File(filePath);

    long contentLength = file.length();
    long partSize = CHUNK_SIZE; // Set part size to 5 MB.
    int numOfParts = 1;
    if (contentLength > CHUNK_SIZE) {
    if (contentLength % CHUNK_SIZE != 0) {
    numOfParts = (int)((contentLength/partSize)+1.0);
    }
    else {
    numOfParts = (int)((contentLength/partSize));
    }
    }

    try {
    // Step 2: Upload parts.
    long filePosition = 0;
    for (int i = 1; filePosition < contentLength; i++) {
    // Last part can be less than 5 MB. Adjust part size.
    partSize = Math.min(partSize, (contentLength - filePosition));

    log.info("Start uploading part[" + i + "] of [" + numOfParts + "]");

    // Create request to upload a part.
    UploadPartRequest uploadRequest = new UploadPartRequest()
    .withBucketName(bucketName).withKey(keyName)
    .withUploadId(initResponse.getUploadId()).withPartNumber(i)
    .withFileOffset(filePosition)
    .withFile(file)
    .withPartSize(partSize);

    // repeat the upload until it succeeds or reaches the retry limit
    boolean anotherPass;
    int retryCount = 0;
    do {
    anotherPass = false; // assume everything is ok
    try {
    log.info("Uploading part[" + i + "]");
    // Upload part and add response to our list.
    partETags.add(s3Client.uploadPart(uploadRequest).getPartETag());
    log.info("Finished uploading part[" + i + "] of [" + numOfParts + "]");
    } catch (Exception e) {
    log.error("Failed uploading part[" + i + "] due to exception. Will retry... Exception: ", e);
    anotherPass = true; // repeat
    retryCount++;
    }
    }
    while (anotherPass && retryCount < CloudUtilsService.MAX_RETRY);

    filePosition += partSize;
    log.info("filePosition=[" + filePosition + "]");

    }
    log.info("Finished uploading file");

    // Complete.
    CompleteMultipartUploadRequest compRequest = new
    CompleteMultipartUploadRequest(
    bucketName,
    keyName,
    initResponse.getUploadId(),
    partETags);

    s3Client.completeMultipartUpload(compRequest);

    log.info("multipart upload completed.upload id=[" + initResponse.getUploadId() + "]");
    } catch (Exception e) {
    s3Client.abortMultipartUpload(new AbortMultipartUploadRequest(
    bucketName, keyName, initResponse.getUploadId()));

    log.error("Failed to upload due to Exception:", e);

    throw e;
    }
    }


    /**
    * Creates new bucket with the names specified if it does not exist.
    *
    * @param bucketName - the name of the bucket to retrieve or create
    * @param accessKey - Amazon account access key
    * @param secretKey - Amazon account secret key
    * @throws S3ServiceException - if something goes wrong
    */
    public void createBucketIfNotExists(String bucketName, String accessKey, String secretKey) throws S3ServiceException {
    try {
    // Create a credentials object and service to access S3 account
    org.jets3t.service.security.AWSCredentials myCredentials =
    new org.jets3t.service.security.AWSCredentials(accessKey, secretKey);
    S3Service service = new RestS3Service(myCredentials);

    // Create a new bucket named after a normalized directory path,
    // and include my Access Key ID to ensure the bucket name is unique
    S3Bucket zeBucket = service.getOrCreateBucket(bucketName);
    log.info("the bucket [" + zeBucket.getName() + "] was created (if it was not existing yet...)");
    } catch (S3ServiceException e) {
    log.error("Failed to get or create bucket[" + bucketName + "] due to exception:", e);
    throw e;
    }
    }

    关于file - 将近 40 万张图像传输到 S3 的最有效方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5289333/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com