gpt4 book ai didi

java - 找出从S3 for Java下载的压缩文件的MIME类型

转载 作者:行者123 更新时间:2023-11-29 04:16:38 25 4
gpt4 key购买 nike

客户端应该将压缩文件上载到S3文件夹中。然后下载压缩文件并解压缩以对其包含的文件执行各种操作。最初,我们告诉客户将其文件压缩为ZIP文件,但这对我们的客户来说太困难了。相反,它提交了一个具有ZIP扩展名的RAR文件……多么聪明。出于明显的原因,无法使用ZIP解压缩算法对RAR文件进行解压缩。

因此,鉴于我正在使用Linux操作系统上的Amazon SDK开发Java项目,因此我正在寻找一种找出S3下载文件的文件类型的方法。我将照顾如何根据获取的文件类型解压缩文件。

我看过许多堆栈溢出问题,例如this one,但仅通过查看它们(及其注释),似乎没有一个100%有效。

找出压缩文件类型的最佳方法是什么?

最佳答案

TL; DR;

当人们以编程方式将文件上传到Amazon S3时,可以指定对象的Content-Type。如果未指定任何内容(如@ Michael-bot所阐明的那样),则默认分配的值为binary/octet-stream。或者,如果决定通过Amazon S3的GUI上传文件,则该文件从其文件扩展名(很遗憾,不是其内容)获得其Content-Type。如果您可以信任上传文件的任何人以正确设置Content-Type,请继续查看ObjectMetadata,但是如果您不喜欢(像我一样),则需要其他解决方案。

因此,如果您正在寻找一种适用于最常见文件压缩类型的解决方案,那么Files.probeContentTypeApache TikaSimpleMagic似乎是可以接受的解决方案。

最后,我选择了Files.probeContentType,因为它不需要额外的库,并且在Linux机器上也可以正常工作(只要文件没有错误的扩展名,可以使用一种解决方法)。



最初,人们会认为从Amazon S3下载文件时的响应对象包括文件类型。它确实包含此信息,但是当文件的扩展名与其内容不匹配时,就会出现问题。

import com.amazonaws.services.s3.model.S3Object;

final S3Object s3Object = ...;
final String contentType = s3Object.getObjectMetadata().getContentType();


即使文件的内容是Rar文件,此代码也将返回 application/zip。因此,该解决方案对我不起作用。

因此,我花时间来构建一个示例项目,该示例项目使用可用的不同方法和库来测试各种场景。顺便说一下,我正在使用Java 8。

测试的文件类型为:


具有Zip扩展名和不具有扩展名的Zip文件
具有Rar扩展名,Zip扩展名和不带扩展名的Rar文件
具有7z扩展名,Zip扩展名和不带扩展名的7z文件
具有Tar.xz扩展名,Zip扩展名和不具有扩展名的Tar.xz
具有Tar.gz扩展名,Zip扩展名和不带扩展名的Tar.gz


请注意,此处介绍的实现仅用于测试目的。它们不以任何方式被认可用于生产代码中,因为它们不考虑文件锁定问题,而我的想象力无法考虑这些问题。 =)



MimetypesFileTypeMap

实作

import java.io.File;
import javax.activation.MimetypesFileTypeMap;

final File file = new File(basePath + "/" + fileName);
try {
return MimetypesFileTypeMap.getDefaultFileTypeMap().getContentType(file);
} catch (final Exception exception) {
return "<EXCEPTION: " + exception.getMessage() + ">";
}


结果

Rar with Rar extension is: application/octet-stream
Rar with Zip extension is: application/octet-stream
Zip with Zip extension is: application/octet-stream
7z with 7z extension is: application/octet-stream
7z with Zip extension is: application/octet-stream
Tar.xz with Tar.xz extension is: application/octet-stream
Tar.xz with Zip extension is: application/octet-stream
Tar.gz with Tar.gz extension is: application/octet-stream
Tar.gz with Zip extension is: application/octet-stream
Rar without extension is: application/octet-stream
Zip without extension is: application/octet-stream
7z without extension is: application/octet-stream
Tar.xz without extension is: application/octet-stream
Tar.gz without extension is: application/octet-stream


结论

当无法识别文件类型时,此方法返回的值为 application/octet-stream。似乎所有方案都失败了,因此我们应该放弃这种方法。



URLConnection.guessContentTypeFromStream

实作

import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.BufferedInputStream;
import java.net.URLConnection;

final File file = new File(basePath + "/" + fileName);
try {
final FileInputStream fileInputStream = new FileInputStream(file);
final InputStream inputStream = new BufferedInputStream(fileInputStream);

return URLConnection.guessContentTypeFromStream(inputStream);
} catch (final Exception exception) {
return "<EXCEPTION: " + exception.getMessage() + ">";
}


结果

Rar with Rar extension is: null
Rar with Zip extension is: null
Zip with Zip extension is: null
7z with 7z extension is: null
7z with Zip extension is: null
Tar.xz with Tar.xz extension is: null
Tar.xz with Zip extension is: null
Tar.gz with Tar.gz extension is: null
Tar.gz with Zip extension is: null
Rar without extension is: null
Zip without extension is: null
7z without extension is: null
Tar.xz without extension is: null
Tar.gz without extension is: null


结论

同样,此方法在所有情况下均失败。 It seems its support is very limited



Files.probeContentType

实作

import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

try {
final Path path = Paths.get(basePath + "/" + fileName);
return Files.probeContentType(path);
} catch (final Exception exception) {
return "<EXCEPTION: " + exception.getMessage() + ">";
}


结果

Rar with Rar extension is: application/vnd.rar
Rar with Zip extension is: application/zip
Zip with Zip extension is: application/zip
7z with 7z extension is: application/x-7z-compressed
7z with Zip extension is: application/zip
Tar.xz with Tar.xz extension is: application/x-xz-compressed-tar
Tar.xz with Zip extension is: application/zip
Tar.gz with Tar.gz extension is: application/x-compressed-tar
Tar.gz with Zip extension is: application/zip
Rar without extension is: application/vnd.rar
Zip without extension is: application/zip
7z without extension is: application/x-7z-compressed
Tar.xz without extension is: application/x-xz
Tar.gz without extension is: application/gzip


结论

这种方法出奇地好,但不要被愚弄,在某些情况下它总是失败。如果文件扩展名错误(内容不匹配),它将报告文件类型为扩展名。它不应该经常发生,但是如果一个人非常挑剔,则不要使用此方法。

另外, some warn that his approach doesn't work well in Windows


  解决方法:如果设法从文件名中删除扩展名,则将为所有给定方案返回正确的值。




Apache Tika(tika评估版1.18)

似乎有 many flavors of this library(应用程序,服务器,评估等),但是网络上的许多人抱怨它有点“依赖大量”。

实作

import org.apache.tika.Tika;

try {
return new Tika().detect(new File(basePath + "/" + fileName));
} catch (final Exception exception) {
return "<EXCEPTION: " + exception.getMessage() + ">";
}


结果

Rar with Rar extension is: application/x-rar-compressed
Rar with Zip extension is: application/x-rar-compressed
Zip with Zip extension is: application/zip
7z with 7z extension is: application/x-7z-compressed
7z with Zip extension is: application/x-7z-compressed
Tar.xz with Tar.xz extension is: application/x-xz
Tar.xz with Zip extension is: application/x-xz
Tar.gz with Tar.gz extension is: application/gzip
Tar.gz with Zip extension is: application/gzip
Rar without extension is: application/x-rar-compressed
Zip without extension is: application/zip
7z without extension is: application/x-7z-compressed
Tar.xz without extension is: application/x-xz
Tar.gz without extension is: application/gzip


结论

所有文件均已正确识别,但由于它具有优点,因此也有其缺点。

优点:


由Apache维护。
不会被扩展欺骗。


缺点:


真的很重,特别是如果只想检查获取文件类型时。 Tika评估罐重量+ 40MB。




URLConnection

实作

import java.net.URL;
import java.net.URLConnection;

try {
final URL url = new URL("file://" + basePath + "/" + fileName);
final URLConnection urlConnection = url.openConnection();
return urlConnection.getContentType();
} catch (final Exception exception) {
return "<EXCEPTION: " + exception.getMessage() + ">";
}


结果

Rar with Rar extension is: content/unknown
Rar with Zip extension is: application/zip
Zip with Zip extension is: application/zip
7z with 7z extension is: content/unknown
7z with Zip extension is: application/zip
Tar.xz with Tar.xz extension is: content/unknown
Tar.xz with Zip extension is: application/zip
Tar.gz with Tar.gz extension is: application/octet-stream
Tar.gz with Zip extension is: application/zip
Rar without extension is: content/unknown
Zip without extension is: content/unknown
7z without extension is: content/unknown
Tar.xz without extension is: content/unknown
Tar.gz without extension is: content/unknown


结论

它几乎不能识别任何文件压缩格式,并以扩展名而不是其内容进行自我指导。



SimpleMagic 1.14

该项目似乎已更新 at least once a year

实作

import com.j256.simplemagic.ContentInfo;
import com.j256.simplemagic.ContentInfoUtil;

try {
final ContentInfoUtil util = new ContentInfoUtil();
final ContentInfo info = util.findMatch(basePath + "/" + fileName);

return info.getMimeType();
} catch (final Exception exception) {
return "<EXCEPTION: " + exception.getMessage() + ">";
}


结果

Rar with Rar extension is: application/x-rar
Rar with Zip extension is: application/x-rar
Zip with Zip extension is: application/zip
7z with 7z extension is: application/x-7z-compressed
7z with Zip extension is: application/x-7z-compressed
Tar.xz with Tar.xz extension is: <EXCEPTION: null>
Tar.xz with Zip extension is: <EXCEPTION: null>
Tar.gz with Tar.gz extension is: application/x-gzip
Tar.gz with Zip extension is: application/x-gzip
Rar without extension is: application/x-rar
Zip without extension is: application/zip
7z without extension is: application/x-7z-compressed
Tar.xz without extension is: <EXCEPTION: null>
Tar.gz without extension is: application/x-gzip


结论

它几乎适用于我们所有的情况,但是对于大多数“模糊”的压缩格式(例如Tar.xz),它似乎无法检测到它们(并且在过程中引发了异常)。



MimeUtil 2.1.3

该项目 has not been modified since 2010,所以不要期待支持或更新。为了完整起见,这里仅列出了它。

实作

import eu.medsea.mimeutil.MimeUtil2;

try {
final MimeUtil2 mimeUtil = new MimeUtil2();
mimeUtil.registerMimeDetector("eu.medsea.mimeutil.detector.MagicMimeMimeDetector");

return MimeUtil2.getMostSpecificMimeType(mimeUtil.getMimeTypes(basePath + "/" + fileName)).toString();
} catch (final Exception exception) {
return "<EXCEPTION: " + exception.getMessage() + ">";
}


结果

Rar with Rar extension is: application/x-rar
Rar with Zip extension is: application/x-rar
Zip with Zip extension is: application/zip
7z with 7z extension is: application/octet-stream
7z with Zip extension is: application/octet-stream
Tar.xz with Tar.xz extension is: application/octet-stream
Tar.xz with Zip extension is: application/octet-stream
Tar.gz with Tar.gz extension is: application/x-gzip
Tar.gz with Zip extension is: application/x-gzip
Rar without extension is: application/x-rar
Zip without extension is: application/zip
7z without extension is: application/octet-stream
Tar.xz without extension is: application/octet-stream
Tar.gz without extension is: application/x-gzip


结论

它可以识别某些最受欢迎的文件类型,但对于Tar.xz和7z则失败。



文件-命令行

不是最漂亮的解决方案,但必须尝试: Ubuntu file command

实作

import java.io.BufferedReader;
import java.io.InputStreamReader;

try {
final Process process = Runtime.getRuntime().exec("file --mime-type " + basePath + "/" + fileName);

final BufferedReader stdInput = new BufferedReader(new InputStreamReader(process.getInputStream()));

String text = "";

String s;
while ((s = stdInput.readLine()) != null) {
text += s;
}

return text.split(": ")[1];
} catch (final Exception exception) {
return "<EXCEPTION: " + exception.getMessage() + ">";
}


结果

Rar with Rar extension is: application/x-rar
Rar with Zip extension is: application/x-rar
Zip with Zip extension is: application/zip
7z with 7z extension is: application/x-7z-compressed
7z with Zip extension is: application/x-7z-compressed
Tar.xz with Tar.xz extension is: application/x-xz
Tar.xz with Zip extension is: application/x-xz
Tar.gz with Tar.gz extension is: application/gzip
Tar.gz with Zip extension is: application/gzip
Rar without extension is: application/x-rar
Zip without extension is: application/zip
7z without extension is: application/x-7z-compressed
Tar.xz without extension is: application/x-xz
Tar.gz without extension is: application/gzip


结论

它适用于我们所有的场景,但是同样,它依赖于运行代码的系统上存在的命令 File

关于java - 找出从S3 for Java下载的压缩文件的MIME类型,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51994837/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com