gpt4 book ai didi

java - 如何使用apache poi从.docx文档中获取图片和表格?

转载 作者:行者123 更新时间:2023-12-04 02:04:52 27 4
gpt4 key购买 nike

亲爱的,我尝试将整个文档从 .docx 文件提取到 java 中的文本区域,但我只收到没有图像或表格的文本,所以有什么建议吗?提前致谢。

我的代码是:

try{
JFileChooser chooser = new JFileChooser();
chooser.showOpenDialog(null);
XWPFDocument doc = new XWPFDocument(new
FileInputStream(chooser.getSelectedFile()));
XWPFWordExtractor extract = new XWPFWordExtractor(doc);
content.setText(extract.getText());
content.setFont(new Font("Serif", Font.ITALIC, 16));
content.setLineWrap(true);
content.setWrapStyleWord(true);
content.setBackground(Color.white);

} catch(Exception e){
JOptionPane.showMessageDialog(null, e);
}
}

最佳答案

提取表格使用List<XWPFTable> table = doc.getTables()

下面的例子

public static void readWordDocument() { 
try {
String fileName = "C:\\sample.docx";

if(!(fileName.endsWith(".doc") || fileName.endsWith(".docx"))) {
throw new FileFormatException();
} else {

XWPFDocument doc = new XWPFDocument(new FileInputStream(fileName));

List<XWPFTable> table = doc.getTables();

for (XWPFTable xwpfTable : table) {
List<XWPFTableRow> row = xwpfTable.getRows();
for (XWPFTableRow xwpfTableRow : row) {
List<XWPFTableCell> cell = xwpfTableRow.getTableCells();
for (XWPFTableCell xwpfTableCell : cell) {
if(xwpfTableCell!=null)
{
System.out.println(xwpfTableCell.getText());
List<XWPFTable> itable = xwpfTableCell.getTables();
if(itable.size()!=0)
{
for (XWPFTable xwpfiTable : itable) {
List<XWPFTableRow> irow = xwpfiTable.getRows();
for (XWPFTableRow xwpfiTableRow : irow) {
List<XWPFTableCell> icell = xwpfiTableRow.getTableCells();
for (XWPFTableCell xwpfiTableCell : icell) {
if(xwpfiTableCell!=null)
{
System.out.println(xwpfiTableCell.getText());
}
}
}
}
}
}
}
}
}
}
} catch(FileFormatException e) {
e.printStackTrace();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}

}

提取图像使用List<XWPFPictureData> piclist=docx.getAllPictures()

看下面的例子

    public static void extractImages(String src){
try{

//create file inputstream to read from a binary file
FileInputStream fs=new FileInputStream(src);
//create office word 2007+ document object to wrap the word file
XWPFDocument docx=new XWPFDocument(fs);
//get all images from the document and store them in the list piclist
List<XWPFPictureData> piclist=docx.getAllPictures();
//traverse through the list and write each image to a file
Iterator<XWPFPictureData> iterator=piclist.iterator();
int i=0;
while(iterator.hasNext()){
XWPFPictureData pic=iterator.next();
byte[] bytepic=pic.getData();
BufferedImage imag=ImageIO.read(new ByteArrayInputStream(bytepic));
ImageIO.write(imag, "jpg", new File("D:/imagefromword"+i+".jpg"));
i++;
}

}catch(Exception e){System.exit(-1);}

}

关于java - 如何使用apache poi从.docx文档中获取图片和表格?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44280677/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com