org.apache.tika.parser.pkg.ZipContainerDetector类的使用及代码示例-6ren

org.apache.tika.parser.pkg.ZipContainerDetector类的使用及代码示例

转载作者：知者更新时间：2024-03-13 12:59:21

本文整理了Java中org.apache.tika.parser.pkg.ZipContainerDetector类的一些代码示例，展示了ZipContainerDetector类的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台，是从一些精选项目中提取出来的代码，具有较强的参考意义，能在一定程度帮忙到你。ZipContainerDetector类的具体详情如下：
包路径：org.apache.tika.parser.pkg.ZipContainerDetector
类名称：ZipContainerDetector

ZipContainerDetector介绍

[英]A detector that works on Zip documents and other archive and compression formats to figure out exactly what the file is.
[中]一种检测器，可以对Zip文档和其他存档和压缩格式进行检测，以准确确定文件是什么。

代码示例

代码示例来源：origin: apache/tika

public MediaType detect(InputStream input, Metadata metadata)
    throws IOException {
  // Check if we have access to the document
  if (input == null) {
    return MediaType.OCTET_STREAM;
  }
  TemporaryResources tmp = new TemporaryResources();
  try {
    TikaInputStream tis = TikaInputStream.get(input, tmp);
    byte[] prefix = new byte[1024]; // enough for all known formats
    int length = tis.peek(prefix);
    MediaType type = detectArchiveFormat(prefix, length);
    if (type == TIFF) {
      return TIFF;
    } else if (PackageParser.isZipArchive(type)
          && TikaInputStream.isTikaInputStream(input)) {
      return detectZipFormat(tis);
    } else if (!type.equals(MediaType.OCTET_STREAM)) {
      return type;
    } else {
      return detectCompressorFormat(prefix, length);
    }
  } finally {
    try {
      tmp.dispose();
    } catch (TikaException e) {
      // ignore
    }
  }
}

代码示例来源：origin: apache/tika

zipEntrySource = new ZipFileZipEntrySource(new ZipFile(stream.getFile()));
} catch (IOException e) {
  return tryStreamingDetection(stream);
  pkg = OPCPackage.open(zipEntrySource);
} catch (SecurityException e) {
  closeQuietly(zipEntrySource);
  closeQuietly(zipEntrySource);
  return null;
  type = detectOfficeOpenXML(pkg);
  if (type == null) {
    type = detectXPSOPC(pkg);
    type = detectAutoCADOPC(pkg);
  closeQuietly(zipEntrySource);
  closeQuietly(zipEntrySource);
  return null;

代码示例来源：origin: apache/tika

MediaType type = detectOPCBased(tis);
if (type != null) {
  return type;
  type = detectOpenDocument(zip);
    type = detectIWork13(zip);
    type = detectIWork(zip);
    type = detectJar(zip);
    type = detectKmz(zip);
    type = detectIpa(zip);

代码示例来源：origin: apache/tika

ZipContainerDetector detector = new ZipContainerDetector();
MediaType type = null;
try {
  type = detector.detect(stream, new Metadata());
} catch (Exception e) {
  EmbeddedDocumentUtil.recordEmbeddedStreamException(e, parentMetadata);

代码示例来源：origin: com.github.lafa.tikaNoExternal/tika-parsers

MediaType type = detectOfficeOpenXML(pkg);
if (type != null) return type;
type = detectXPSOPC(pkg);
if (type != null) return type;
type = detectAutoCADOPC(pkg);
if (type != null) return type;

代码示例来源：origin: stackoverflow.com

Detector detector;
List<Detector> detectors = new ArrayList<Detector>();
detectors.add(new ZipContainerDetector());
detectors.add(new POIFSContainerDetector());

detectors.add(new MultipartSignedDetector());

detectors.add(MimeTypes.getDefaultMimeTypes());
detector = new CompositeDetector(detectors);
String mimetype = detector.detect(TikaInputStream.get(new File(args[0])), new Metadata()).toString();

代码示例来源：origin: org.apache.tika/tika-parsers

ZipContainerDetector detector = new ZipContainerDetector();
MediaType type = null;
try {
  type = detector.detect(stream, new Metadata());
} catch (Exception e) {
  EmbeddedDocumentUtil.recordEmbeddedStreamException(e, parentMetadata);

代码示例来源：origin: stackoverflow.com

detectors.add(new ZipContainerDetector());

代码示例来源：origin: org.apache.tika/tika-parsers

MediaType type = detectOPCBased(tis);
if (type != null) {
  return type;
  type = detectOpenDocument(zip);
    type = detectIWork13(zip);
    type = detectIWork(zip);
    type = detectJar(zip);
    type = detectKmz(zip);
    type = detectIpa(zip);

代码示例来源：origin: org.apache.tika/tika-parsers

pkg = OPCPackage.open(zipEntrySource);
} catch (SecurityException e) {
  closeQuietly(zipEntrySource);
  closeQuietly(zipEntrySource);
  return null;
  type = detectOfficeOpenXML(pkg);
  if (type == null) {
    type = detectXPSOPC(pkg);
    type = detectAutoCADOPC(pkg);
  closeQuietly(zipEntrySource);
  closeQuietly(zipEntrySource);
  return null;

代码示例来源：origin: org.apache.tika/tika-parsers

public MediaType detect(InputStream input, Metadata metadata)
    throws IOException {
  // Check if we have access to the document
  if (input == null) {
    return MediaType.OCTET_STREAM;
  }
  TemporaryResources tmp = new TemporaryResources();
  try {
    TikaInputStream tis = TikaInputStream.get(input, tmp);
    byte[] prefix = new byte[1024]; // enough for all known formats
    int length = tis.peek(prefix);
    MediaType type = detectArchiveFormat(prefix, length);
    if (type == TIFF) {
      return TIFF;
    } else if (PackageParser.isZipArchive(type)
          && TikaInputStream.isTikaInputStream(input)) {
      return detectZipFormat(tis);
    } else if (!type.equals(MediaType.OCTET_STREAM)) {
      return type;
    } else {
      return detectCompressorFormat(prefix, length);
    }
  } finally {
    try {
      tmp.dispose();
    } catch (TikaException e) {
      // ignore
    }
  }
}

代码示例来源：origin: com.github.lafa.tikaNoExternal/tika-parsers

ZipContainerDetector detector = new ZipContainerDetector();
MediaType type = null;
try {
  type = detector.detect(stream, new Metadata());
} catch (Exception e) {
  EmbeddedDocumentUtil.recordEmbeddedStreamException(e, parentMetadata);

代码示例来源：origin: com.github.lafa.tikaNoExternal/tika-parsers

ZipFile zip = new ZipFile(tis.getFile()); // TODO: hasFile()?
try {
  MediaType type = detectOpenDocument(zip);
  if (type == null) {
    type = detectOPCBased(zip, tis);
    type = detectIWork13(zip);
    type = detectIWork(zip);
    type = detectJar(zip);
    type = detectKmz(zip);
    type = detectIpa(zip);

代码示例来源：origin: com.github.lafa.tikaNoExternal/tika-parsers

public MediaType detect(InputStream input, Metadata metadata)
    throws IOException {
  // Check if we have access to the document
  if (input == null) {
    return MediaType.OCTET_STREAM;
  }
  TemporaryResources tmp = new TemporaryResources();
  try {
    TikaInputStream tis = TikaInputStream.get(input, tmp);
    byte[] prefix = new byte[1024]; // enough for all known formats
    int length = tis.peek(prefix);
    MediaType type = detectArchiveFormat(prefix, length);
    if (PackageParser.isZipArchive(type)
        && TikaInputStream.isTikaInputStream(input)) {
      return detectZipFormat(tis);
    } else if (!type.equals(MediaType.OCTET_STREAM)) {
      return type;
    } else {
      return detectCompressorFormat(prefix, length);
    }
  } finally {
    try {
      tmp.dispose();
    } catch (TikaException e) {
      // ignore
    }
  }
}

apache-tika - 有没有办法关闭 tika-server 中嵌入文档的解析？
我运行一个未修改的 Apache tika-server 1.22 的 JAX-RS 实例，并将其用作 HTTP 端点服务，我将文件发布到(主要是 Office、PDF 和 RTF)并通过 HTTP
scala - 从 Tika LanguageIdentifier 转移到 Tika LanguageDetector
我们当前的 Scala 代码与 Tika 1.13+ 一起运行，我们使用现已弃用的 LanguageIdentifier 代码，如下所示: import org.apache.tika.languag
apache-tika - 使用 Apache Tika 时出现 NoSuchMethodError
使用Apache Tika提取JPEG图片元数据时遇到如下错误 java.lang.NoSuchMethodError: com.adobe.xmp.properties.XMPPropertyInf
java - 使用什么版本的 Apache Tika 来创建以下示例 Tika 代码？
我在网上找到了以下示例源代码，它的名称为 MyFirstTika.java，但我无法确定使用哪个版本的 Tika 来编译它。或者 Tika 的所有版本都兼容吗？ The code is linked
Python Tika 无法读取 PDF - 无法下载 Tika Server
我正在使用 Tika 阅读 PDF，并且我的代码直到昨天才正常工作。现在，当我运行相同的代码时，我收到错误，并且显然 Tika 找不到 Tika 服务器 jar 文件。我正在使用以下代码来阅读 PDF
apache-tika - Apache Tika 服务器 - 请求 header 参数？
Apache Tika 服务器提供了一个 Rest API 来从文档中提取文本。也可以设置特定的请求 header 参数，例如 X-Tika-PDFOcrStrategy。例如: $ curl -T
apache-tika - Apache Tika 不提取 RTF 文件的第一行，它只提取第一行的最后三个字符。
我在评论中添加了RTF文件。在文本编辑器中复制以下文本并另存为RTF格式。 BodyContentHandler handler = new BodyContentHandler(); Metadat
java - Tika、Maven、依赖项……为什么 Tika 使用 EmptyParser？
我想在 Maven 项目中使用 Tika 作为依赖项，从文件中提取元数据。当我使用 mvn exec:java 运行该类时它工作正常，但使用 java -cp 运行该类时它工作正常，所以我怀疑这是一个
java - 使用 tika 检索错误的 mime 类型 (application/x-tika-ooxml)
对于 docx 文件，我检索 application/x-tika-ooxml，但我应该检索 application/vnd.openxmlformats-officedocument.wordpro
java - 使用 tika 检索错误的 mime 类型 (application/x-tika-ooxml)
对于 docx 文件，我检索 application/x-tika-ooxml，但我应该检索 application/vnd.openxmlformats-officedocument.wordpro
python - 将 tika 与 python 一起使用，runtimeerror : unable to start tika server
我正在尝试使用 tika 包来解析文件。 Tika 已成功安装，tika-server-1.18.jar 使用 cmd 中的代码运行 Java -jar tika-server-1.18.jar 我在
java - tika-app-1.7.jar 与 tika-server-1.7.jar
我使用 jersey 框架在 java 中编写了一个 Web 服务，该框架使用了我的 apache tika 包装器。该包装器包装了 tika-app-1.7.jar。我的问题是什么是最好的方法:包装
java - 更改 tika-config.xml 中的解析器会导致 "Unable to load org.apache.tika.parser.DefaultParser"
我正在尝试在 Nutch 中启用 Tika 的 BoilerpipeContentHandler 解析器，以从网页中提取文章文本。为此，我配置了 tika-config.xml 以排除 HTMLPar
python - 如何在 python(2.7) 中使用 Tika 包(https ://github. com/chrismattmann/tika-python)来解析 PDF 文件？
我正在尝试解析一些包含工程图纸的 PDF 文件以获取文件中的文本数据。我尝试将 TIKA 用作 python 的 jar，并将其与 jnius 包一起使用(在此处使用本教程: http://www.h
apache-tika - 如何在服务器模式下使用Tika
在 Tika 的网站上，它说(关于 tika-app-1.2.jar)它可以在服务器模式下使用。有谁知道如何在该服务器运行后发送文档并从该服务器接收解析的文本？最佳答案 Tika 支持两种“服务器”
apache-tika - 如何使用TIka读取大文件？
我正在使用 Tika 解析大型 pdf 和 word 文档，但我收到以下错误消息。 Your document contained more than 100000 characters, and s
java - Tika 解析给出最大限制达到错误
我正在使用 Apache Tika 从 PDF 文件获取内容。当我运行它时，出现以下错误。我没有在任何地方看到这个错误的记录，这只是一个糟糕的惊喜。 org.apache.tika.sax.Writ
java - Tika 在增量读取期间传递解析器信息
我知道 Tika 有一个非常好的包装器，它让我可以像这样从解析文件中获取一个 Reader: Reader parsedReader = tika.parse(in); 但是，如果我使用它，我无法指定
java - Tika 在服务器模式下的性能
我阅读了一些文章，认为服务器模式下的 tika 可以提高性能。有人能解释一下怎么做吗？我们能否在 Java 应用程序中实现类似的功能以获得更好的性能？ Running tika in server m
java - Tika 无法删除临时文件
在我们的应用程序中，我们使用 Apache Tika 处理文件。但是有一些文件(例如*.mov，*.mp4)Tika无法处理并在用户的Temp文件夹中留下相应的*.tmp文件。经过一些研究，我发现这是

知者

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城