gpt4 book ai didi

java - Tika 检测多部分/签名

转载 作者:塔克拉玛干 更新时间:2023-11-02 20:02:15 24 4
gpt4 key购买 nike

我正在使用 Tika 来自动检测被推送到 DMS 中的文档的内容类型。除了电子邮件,几乎所有功能都运行良好。

我必须区分标准邮件消息(mime => message/rfc822)和签名邮件消息(mime => multipart/signed),但所有电子邮件都被检测为 message/rfc822。

未正确检测到的签名邮件具有以下内容类型 header :

Content-Type: multipart/signed; protocol="application/x-pkcs7-signature"; micalg=sha1; boundary="----4898E6D8BDE1929CA602BE94D115EF4C"

我用来解析的java代码是:

Detector detector;
List<Detector> detectors = new ArrayList<Detector>();
detectors.add(new ZipContainerDetector());
detectors.add(new POIFSContainerDetector());
detectors.add(MimeTypes.getDefaultMimeTypes());
detector = new CompositeDetector(detectors);
String mimetype = detector.detect(TikaInputStream.get(new File(args[0])), new Metadata()).toString();

我正在引用核心库和 tika-parsers 来检测 pdf 和 msword 文档。我还缺少其他东西吗?

最佳答案

我解决了我的问题。我通过实现 Detector 接口(interface)实现了自定义检测器:

public class MultipartSignedDetector implements Detector {

@Override
public MediaType detect(InputStream is, Metadata metadata) throws IOException {

TemporaryResources tmp = new TemporaryResources();

TikaInputStream tis = TikaInputStream.get(is, tmp);
tis.mark(Integer.MAX_VALUE);

try {

MimeMessage mimeMessage = null;
String host = "host.com";
Properties properties = System.getProperties();
properties.setProperty("mail.smtp.host", host);
Session session = Session.getDefaultInstance(properties);

mimeMessage = new MimeMessage(session, tis);

if(mimeMessage.getContentType() != null && mimeMessage.getMessageID() != null && mimeMessage.getContentType().toLowerCase().contains("multipart/signed"))
return new MediaType("multipart", "signed");
else
return MediaType.OCTET_STREAM;

} catch(Exception e) {
return MediaType.OCTET_STREAM;
} finally {
try {
tis.reset();
tmp.dispose();
} catch (TikaException e) {
// ignore
}
}
}
}

然后将自定义检测器添加到复合检测器中,就在默认检测器之前:

Detector detector;
List<Detector> detectors = new ArrayList<Detector>();
detectors.add(new ZipContainerDetector());
detectors.add(new POIFSContainerDetector());

detectors.add(new MultipartSignedDetector());

detectors.add(MimeTypes.getDefaultMimeTypes());
detector = new CompositeDetector(detectors);
String mimetype = detector.detect(TikaInputStream.get(new File(args[0])), new Metadata()).toString();

关于java - Tika 检测多部分/签名,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27397807/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com