gpt4 book ai didi

java - 在 Apache POI 中读取 10 MB 文件

转载 作者:行者123 更新时间:2023-12-01 10:19:58 24 4
gpt4 key购买 nike

我正在进行的项目正在尝试读取一个非常大的 Excel 文件(几百列和大约 3000 行)并识别一系列字母中的模式。它在较小的文件上工作得很好,但是当我尝试使用此文件运行它时,即使我只是尝试分析前几个文件,我也会收到 java.lang.OutOfMemoryError: Java heap space 错误行。错误似乎位于 Workbook wb = WorkbookFactory.create(new File(filepath));

我已经尝试过该网站上的一些解决方案,但没有取得任何成功。我的代码如下:

import java.awt.List;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;

import org.apache.poi.EncryptedDocumentException;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.WorkbookFactory;
import org.apache.poi.xssf.usermodel.XSSFCell;
import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

public class ExcelReader {

public int Reader(File file) throws IOException, EncryptedDocumentException, InvalidFormatException {
String filepath = file.getPath();
Workbook wb = WorkbookFactory.create(new File(filepath));
XSSFSheet sheet = (XSSFSheet) wb.getSheetAt(0);
XSSFRow row;
XSSFCell cell;
ArrayList<Integer> list = new ArrayList<Integer>();

int rows;
int cols = 0;
int temp = 0;
rows = sheet.getPhysicalNumberOfRows();

for (int i = 0; i <= 1; i++) {
row = sheet.getRow(i);
if (row != null) {
temp = sheet.getRow(i).getPhysicalNumberOfCells();
if (temp > cols)
cols = temp;
}
}
for (int r = 0; r <= 60; r++) {
row = sheet.getRow(r);
if (row != null) {
for (int c = 0; c <= cols; c++) {
int numblanks = 0;
cell = row.getCell((short) c);
if (cell != null) {
//System.out.print(cell + "\t\t");
} else {
//System.out.print("\t\t");
}
if (cell != null && cell.getCellType() == XSSFCell.CELL_TYPE_STRING) {
if ("N".equals(cell.getStringCellValue())) {
for (int k = c; k <= cols; k++) {
if ("-".equals(row.getCell(k).getStringCellValue())) {
numblanks++;
continue;
}
if ("S".equals(row.getCell(c + 2 + numblanks).getStringCellValue())
|| "T".equals(row.getCell(c + 2 + numblanks).getStringCellValue())) {
list.add((int) sheet.getRow(1).getCell(c).getNumericCellValue());
break;
}
}
}
}
}
System.out.println();
}
}
System.out.println();
System.out.println("Rows: " + rows);
System.out.println("Columns: " + cols);
System.out.println(list);
return temp;
}
}

感谢您给我的任何帮助!

最佳答案

我之前解决过这个问题。我的案例是读取一个 23M 的 Excel 文件,其中包含 230k 行。

增加最大堆大小并不是一个好的解决方案。 Apache poi 没有流模式来读取数据。这种非流模式消耗太多内存。

我的解决方案是将数据转换为xml,然后使用XMLReader解析数据。

请检查以下示例代码:

    protected List<Entity> parseData(InputStream in) throws Exception {
OPCPackage pkg = OPCPackage.open(in);
XSSFReader r = new XSSFReader(pkg);
SharedStringsTable sst = r.getSharedStringsTable();
XMLReader parser = fetchSheetParser(sst);
XSSFReader.SheetIterator sheets = (XSSFReader.SheetIterator) r.getSheetsData();

while (sheets.hasNext()) {
InputStream sheet = sheets.next();
InputSource sheetSource = new InputSource(sheet);
parser.parse(sheetSource);
sheet.close();
break; // if only need to process one sheet.
}
return SheetHandler.getRawData();
}

private XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException {
XMLReader parser =
XMLReaderFactory.createXMLReader();
ContentHandler handler = new SheetHandler(sst);
parser.setContentHandler(handler);
return parser;
}

private static class SheetHandler extends DefaultHandler {

private SharedStringsTable sst;
private String lastContents;
private boolean nextIsString;
private boolean nextIsInlineString;
private boolean nextIsNull;

private SheetHandler(SharedStringsTable sst) {
this.sst = sst;
rawData = new ArrayList<Entity>();
}

public static List<Entity> getRawData() {
return rawData;
}


@Override
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {

}

@Override
public void endElement(String uri, String localName, String name)
throws SAXException {


}

@Override
public void characters(char[] ch, int start, int length)
throws SAXException {
lastContents += new String(ch, start, length);
}
}
}

关于java - 在 Apache POI 中读取 10 MB 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35659863/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com