gpt4 book ai didi

apache-tika - 如何使用TIka读取大文件?

转载 作者:行者123 更新时间:2023-12-03 04:45:14 30 4
gpt4 key购买 nike

我正在使用 Tika 解析大型 pdf 和 word 文档,但我收到以下错误消息。

Your document contained more than 100000 characters, and so your requested limit has been reached. To receive the full text of the document, increase your limit. (Text up to the limit is however available).

如何提高限​​制?

最佳答案

假设您基本上遵循 Tika example for extracting to plain text ,那么您需要做的就是 create your BodyContentHandler with a write limit of -1禁用写入限制,如 javadocs 中所述

您的代码将类似于 ( inspired by the example ):

BodyContentHandler handler = new BodyContentHandler(-1);

InputStream stream = ContentHandlerExample.class.getResourceAsStream("test.doc");
AutoDetectParser parser = new AutoDetectParser();
Metadata metadata = new Metadata();
try {
parser.parse(stream, handler, metadata);
return handler.toString();
} finally {
stream.close();
}

关于apache-tika - 如何使用TIka读取大文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31079433/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com