gpt4 book ai didi

java - 异常 org.apache.poi.poifs.filesystem.OfficeXmlFileException - apache.Poi 4.0.0

转载 作者:行者123 更新时间:2023-12-02 01:23:57 25 4
gpt4 key购买 nike

我正在尝试阅读 Microsoft Word 2016 文档,但无法...

private String readDoc(String path) {
String content = "";
try {
File file = new File(path);
FileInputStream fis = new FileInputStream(file.getAbsolutePath());

HWPFDocument doc = new HWPFDocument(fis);

WordExtractor we = new WordExtractor(doc);
String[] paragraphs = we.getParagraphText();
for (String para : paragraphs) {
content += para.toString();
}
fis.close();
return content;
} catch (Exception e) {
e.printStackTrace();
}
return content;
}

Exception in thread "main" org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)

我不明白...为什么它会给我这个异常,因为我没有使用任何 XSSF(我认为)。

最佳答案

试试这个:

FileInputStream fis = new FileInputStream("test.docx");
XWPFDocument xdoc = new XWPFDocument(OPCPackage.open(fis));
XWPFWordExtractor extractor = new XWPFWordExtractor(xdoc);
System.out.println(extractor.getText());

它可以帮助理解这一点:

POIFS (Poor Obfuscation Implementation File System) − This component is the basic factor of all other POI elements. It is used to read different files explicitly.

HSSF (Horrible SpreadSheet Format) − It is used to read and write .xls format of MS-Excel files.

XSSF (XML SpreadSheet Format) − It is used for .xlsx file format of MS-Excel.

HPSF (Horrible Property Set Format) − It is used to extract property sets of the MS-Office files.

HWPF (Horrible Word Processor Format) − It is used to read and write .doc extension files of MS-Word.

XWPF (XML Word Processor Format) − It is used to read and write .docx extension files of MS-Word.

HSLF (Horrible Slide Layout Format) − It is used to read, create, and edit PowerPoint presentations.

HDGF (Horrible DiaGram Format) − It contains classes and methods for MS-Visio binary files.

HPBF (Horrible PuBlisher Format) − It is used to read and write MS-Publisher files.

关于java - 异常 org.apache.poi.poifs.filesystem.OfficeXmlFileException - apache.Poi 4.0.0,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57218288/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com