gpt4 book ai didi

java - 使用 AWS Lambda 从 S3 上的目录创建 Tar 存档

转载 作者:塔克拉玛干 更新时间:2023-11-03 03:25:23 26 4
gpt4 key购买 nike

我需要提取存储在 s3 上的一堆 zip 文件并将它们添加到 tar 存档并将该存档存储在 s3 上。 zip 文件的总和可能会大于 lambda 函数允许的 512mb 本地存储。我有一个部分灵魂,从 s3 获取对象,提取它们并将它们放入 s3 对象中,而不使用 lambda 本地存储。

提取对象Thread

public class ExtractObject implements Runnable{

private String objectName;
private String uuid;
private final byte[] buffer = new byte[1024];

public ExtractAdvert(String name, String uuid) {
this.objectName= name;
this.uuid= uuid;
}

@Override
public void run() {
final String srcBucket = "my-bucket-name";
final AmazonS3 s3Client = new AmazonS3Client();

try {
S3Object s3Object = s3Client.getObject(new GetObjectRequest(srcBucket, objectName));
ZipInputStream zis = new ZipInputStream(s3Object.getObjectContent());
ZipEntry entry = zis.getNextEntry();

while(entry != null) {
String fileName = entry.getName();
String mimeType = FileMimeType.fromExtension(FilenameUtils.getExtension(fileName)).mimeType();
System.out.println("Extracting " + fileName + ", compressed: " + entry.getCompressedSize() + " bytes, extracted: " + entry.getSize() + " bytes, mimetype: " + mimeType);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
int len;
while ((len = zis.read(buffer)) > 0) {
outputStream.write(buffer, 0, len);
}
InputStream is = new ByteArrayInputStream(outputStream.toByteArray());
ObjectMetadata meta = new ObjectMetadata();
meta.setContentLength(outputStream.size());
meta.setContentType(mimeType);
System.out.println("##### " + srcBucket + ", " + FilenameUtils.getFullPath(objectName) + "tmp" + File.separator + uuid + File.separator + fileName);

// Add this to tar archive instead of putting back to s3
s3Client.putObject(srcBucket, FilenameUtils.getFullPath(objectName) + "tmp" + File.separator + uuid + File.separator + fileName, is, meta);
is.close();
outputStream.close();
entry = zis.getNextEntry();
}
zis.closeEntry();
zis.close();
} catch (IOException ioe) {
System.out.println(ioe.getMessage());
}
}
}

这会为每个需要提取的对象运行,并将它们保存在 tar 文件所需结构中的 s3 对象中。

我认为我需要的不是将对象放回 s3,而是将其保存在内存中并将其添加到 tar 存档中。并上传它,但经过大量的环顾四周和反复试验后,我还没有创建成功的 tar 文件。主要问题是我无法在 lambda 中使用 tmp 目录。


编辑我应该边走边创建 tar 文件而不是将对象放入 s3 吗? (参见注释 //将其添加到 tar 存档而不是放回 s3)如果是这样,我如何创建一个 tar 流而不在本地存储它?


编辑 2:尝试给文件去皮

ListObjectsV2Request req = new ListObjectsV2Request().withBucketName(bucketName);
ListObjectsV2Result result;

ByteArrayOutputStream baos = new ByteArrayOutputStream();
TarArchiveOutputStream tarOut = new TarArchiveOutputStream(baos);

do {
result = s3Client.listObjectsV2(req);

for (S3ObjectSummary objectSummary : result.getObjectSummaries()) {

if(objectSummary.getKey().startsWith("tmp/") ) {
System.out.printf(" - %s (size: %d)\n", objectSummary.getKey(), objectSummary.getSize());
S3Object s3Object = s3Client.getObject(new GetObjectRequest(bucketName, objectSummary.getKey()));
InputStream is = s3Object.getObjectContent();
System.out.println("Pre Create entry");
TarArchiveEntry archiveEntry = new TarArchiveEntry(IOUtils.toByteArray(is));
// Getting following exception above
// IllegalArgumentException: Invalid byte 111 at offset 7 in ' positio' len=8
System.out.println("Pre put entry");
tarOut.putArchiveEntry(archiveEntry);
System.out.println("Post put entry");
}
}

String token = result.getNextContinuationToken();
System.out.println("Next Continuation Token: " + token);
req.setContinuationToken(token);
} while (result.isTruncated());

ObjectMetadata metadata = new ObjectMetadata();
InputStream is = new ByteArrayInputStream(baos.toByteArray());
s3Client.putObject(new PutObjectRequest(bucketName, bucketFolder + "tar-file", is, metadata));

最佳答案

我已经找到了一个解决方案,它与我在上面的编辑 2 中的尝试非常相似。

private final String bucketName = "bucket-name";
private final String bucketFolder = "tmp/";
private final String tarKey = "tar-dir/tared-file.tar";

private void createTar() throws IOException, ArchiveException {
ListObjectsV2Request req = new ListObjectsV2Request().withBucketName(bucketName);
ListObjectsV2Result result;

ByteArrayOutputStream baos = new ByteArrayOutputStream();
TarArchiveOutputStream tarOut = new TarArchiveOutputStream(baos);

do {
result = s3Client.listObjectsV2(req);

for (S3ObjectSummary objectSummary : result.getObjectSummaries()) {
if (objectSummary.getKey().startsWith(bucketFolder)) {
S3Object s3Object = s3Client.getObject(new GetObjectRequest(bucketName, objectSummary.getKey()));
InputStream is = s3Object.getObjectContent();

String s3Key = objectSummary.getKey();
String tarPath = s3Key.substring(s3Key.indexOf('/') + 1, s3Key.length());
s3Key.lastIndexOf('.'));

byte[] ba = IOUtils.toByteArray(is);

TarArchiveEntry archiveEntry = new TarArchiveEntry(tarPath);
archiveEntry.setSize(ba.length);
tarOut.putArchiveEntry(archiveEntry);
tarOut.write(ba);
tarOut.closeArchiveEntry();
}
}

String token = result.getNextContinuationToken();
System.out.println("Next Continuation Token: " + token);
req.setContinuationToken(token);
} while (result.isTruncated());

ObjectMetadata metadata = new ObjectMetadata();
InputStream is = baos.toInputStream();
metadata.setContentLength(baos.size());
s3Client.putObject(new PutObjectRequest(bucketName, tarKey, is, metadata));
}

关于java - 使用 AWS Lambda 从 S3 上的目录创建 Tar 存档,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53265055/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com