gpt4 book ai didi

java - 使用Java将10000行word文档转换为excel表格耗尽堆空间

转载 作者:太空宇宙 更新时间:2023-11-04 07:18:49 25 4
gpt4 key购买 nike

我有一个大型 Word 文档(超过 10,000 行),其中包含必须使用 Java 将其转换为 Excel 的信息表。我正在使用 apache poi 提取表并将其保存到 Excel。我有以下代码,它在 iMac 上的行子集上运行。但是,在完整文档上运行代码时出现堆空间异常:

public class WordExtractor {
public static void main(String[] args) {
try {
File inputFile = new File("table.docx");
POITextExtractor extractor = ExtractorFactory.createExtractor(inputFile);

String text = extractor.getText();
BufferedReader reader = new BufferedReader(new StringReader(text));
String line = null;
boolean breakRead = false;
int rowCount = 0;
HSSFWorkbook workbook = new HSSFWorkbook();
HSSFSheet sheet = workbook.createSheet("sheet1");
while (!breakRead) {
line = reader.readLine();
if (line != null) {
Row row = sheet.createRow(rowCount);
StringTokenizer st = new StringTokenizer(line, "\t");
int cellnum = 0;
while (st.hasMoreTokens()) {
Cell cell = row.createCell(cellnum++);
String token = st.nextToken();
System.out.println(" = " + token);
cell.setCellValue(token);
}
} else {
breakRead = true;
}
rowCount++;
}

try {
FileOutputStream out =
new FileOutputStream(new File("new.xls"));
workbook.write(out);
out.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}

最佳答案

感谢评论中的建议,我能够通过删除每行不必要的 String 对象创建来解决这个问题。无论如何,我可以通过将 System.gc() 放在主 while 循环的末尾来解决这个问题。此外,我还更新了虚拟机参数,为应用程序提供更多运行时内存。我使用了以下设置:-d64 -Xms512m -Xmx4g。最后,我在创建 Excel 之前显式关闭了提取器和文件读取器对象。

这是更新后的代码:

public class WordExtractor {
public static void main(String[] args) {
try {
File inputFile = new File("table.docx");
POITextExtractor extractor = ExtractorFactory.createExtractor(inputFile);
String text = extractor.getText();
BufferedReader reader = new BufferedReader(new StringReader(text));
String line = null;
boolean breakRead = false;
int rowCount = 0;
HSSFWorkbook workbook = new HSSFWorkbook();
HSSFSheet sheet = workbook.createSheet("sheet1");
while (!breakRead) {
line = reader.readLine();
if (line != null) {
Row row = sheet.createRow(rowCount);
StringTokenizer st = new StringTokenizer(line, "\t");
int cellnum = 0;
while (st.hasMoreTokens()) {
Cell cell = row.createCell(cellnum++);
String token = st.nextToken();
cell.setCellValue(token);
}
} else {
breakRead = true;
}
rowCount++;
if (rowCount % 100 == 0) {
// breakRead = true;
System.gc();
}
}
reader.close();
extractor.close();
System.gc();
try {
FileOutputStream out =
new FileOutputStream(new File("new.xls"));
workbook.write(out);
out.close();
System.out.println("Excel written successfully..");

} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}

关于java - 使用Java将10000行word文档转换为excel表格耗尽堆空间,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19607900/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com