gpt4 book ai didi

java - 将许多记录写入 excel 文件会变得非常慢

转载 作者:塔克拉玛干 更新时间:2023-11-01 21:53:02 25 4
gpt4 key购买 nike

我有一个包含 1,300,000 条记录的数组。每条记录本身就是一个数组。我读取了数组的每条记录,并将该记录的每个存储桶插入到 Excel 工作表的一行的单元格中,最后,我将该 Excel 工作表写入 Excel 文件。写入 100k 条记录后,它变得越来越慢,然后在最后中断。我使用 POI apache 来完成它,这是我的代码,我不确定是什么原因导致写入过程变慢了那么多。有什么提示吗?

try {
//save to excel file
FileOutputStream out = new FileOutputStream(new File(path));
XSSFWorkbook resultWorkBook = new XSSFWorkbook();
XSSFSheet sheet = resultWorkBook.createSheet("Comparison_result");
int sizeOfOriginalTermMain = 0;
int sizeOfOriginalTermMatch = 0;
//blue cell style
CellStyle blueStyle = resultWorkBook.createCellStyle();
XSSFFont cellFont = resultWorkBook.createFont();
cellFont.setColor(IndexedColors.BLUE.getIndex());
blueStyle.setFont(cellFont);

//yellow bg cell style
CellStyle GreenStyle = resultWorkBook.createCellStyle();
GreenStyle.setFillBackgroundColor(IndexedColors.GREEN.getIndex());


//create heading
Row heading = sheet.createRow(0);
heading.createCell(0).setCellValue("Main List ID");
heading.createCell(1).setCellValue("Match number > 0");
heading.createCell(2).setCellValue("Found Match ID");
heading.createCell(3).setCellValue("Source list: 2");
heading.createCell(4).setCellValue("Matched Trems");

for(int i=0; i<5;i++) {
CellStyle styleRowHeading = resultWorkBook.createCellStyle();
XSSFFont font = resultWorkBook.createFont();
font.setBold(true);
font.setFontName(XSSFFont.DEFAULT_FONT_NAME);
font.setFontHeightInPoints((short)11);
styleRowHeading.setFont(font);
heading.getCell(i).setCellStyle(styleRowHeading);
}


ArrayList<Object> currentList = new ArrayList<Object>();
RecordId mainRecordId = new RecordId();
String mainRecordIdValue = "";
LinkedHashSet<String> commonStrings = new LinkedHashSet<String>();
int numberOfMatch=0;
RecordId matchRecordId = new RecordId();
String matchRecordIdValue = "";
int size = processResult.size();
int matchRecordIdListNumber = 0;
String concatenatedMatchTerms = "";
ArrayList<String> OrininalTemrsInMainList = new ArrayList<String>();
ArrayList<String> OrininalTemrsInMatchList = new ArrayList<String>();
//adding value to each row of the excel sheet

int q= 0;
for (int i = 0; i < size; i++) {
currentList = processResult.get(i);
Row row = sheet.createRow(i+1);
//object ppmsID column
Cell mainIdCell = row.createCell(0);
mainRecordId = (RecordId)(currentList.get(0));
mainRecordIdValue = mainRecordId.getIdValue();
mainIdCell.setCellValue(mainRecordIdValue);
mainIdCell.setCellStyle(blueStyle);

//productDB column
Cell matchNumberCell = row.createCell(1);
commonStrings = (LinkedHashSet<String>)(currentList.get(2));
numberOfMatch = commonStrings.size();
matchNumberCell.setCellValue(Integer.toString(numberOfMatch));

//match record Id
Cell matchIdCell = row.createCell(2);
matchRecordId = (RecordId)(currentList.get(1));
matchRecordIdValue = matchRecordId.getIdValue();
matchRecordIdListNumber = matchRecordId.getListNumber();
matchIdCell.setCellValue(matchRecordIdValue);


Cell sourceListNumber = row.createCell(3);
sourceListNumber.setCellValue(Integer.toString(matchRecordIdListNumber));

//terms of match
Cell matchTerms = row.createCell(4);
concatenatedMatchTerms = getConcatenatedStringFromList(commonStrings);
matchTerms.setCellValue(concatenatedMatchTerms);

OrininalTemrsInMainList = (ArrayList<String>) currentList.get(3);
sizeOfOriginalTermMain = OrininalTemrsInMainList.size();
OrininalTemrsInMatchList = (ArrayList<String>) currentList.get(4);
sizeOfOriginalTermMatch = OrininalTemrsInMatchList.size();
for (int k = 0; k<sizeOfOriginalTermMain;k++) {
Cell newCell = row.createCell(5+k);
newCell.setCellValue(OrininalTemrsInMainList.get(k));
newCell.setCellStyle(blueStyle);

}
Cell emptyCell = row.createCell(5+sizeOfOriginalTermMain);
emptyCell.setCellValue("emptyCell");
emptyCell.setCellStyle(GreenStyle);
for (int n = 0; n<OrininalTemrsInMatchList.size();n++) {
Cell newCell = row.createCell(5+sizeOfOriginalTermMain+1+n);
newCell.setCellValue(OrininalTemrsInMatchList.get(n));
}

}

resultWorkBook.write(out);
out.close();
resultWorkBook.close();


}catch(Exception e) {
System.out.println(e.getMessage());
}

最佳答案

不要使用 XSSF 创建包含这么多单元格的电子表格。
XSSF 依赖于消耗大量内存的对象。

改为使用 SXSSF那是一个 Streaming Usermodel API。

SXSSF (package: org.apache.poi.xssf.streaming) is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.

将使用 XSSF 的代码更新为使用 SXSSF 是小菜一碟。

两件重要的事情:

窗口大小(内存中可访问的行数):使用默认值或在合适的情况下显式配置

You can specify the window size at workbook construction time via new SXSSFWorkbook(int windowSize) or you can set it per-sheet via SXSSFSheet#setRandomAccessWindowSize(int windowSize)

When a new row is created via createRow() and the total number of unflushed records would exceed the specified window size, then the row with the lowest index value is flushed and cannot be accessed via getRow() anymore.

The default window size is 100 and defined by SXSSFWorkbook.DEFAULT_WINDOW_SIZE.

清理要求

SXSSF allocates temporary files that you must always clean up explicitly, by calling the dispose method.

它应该被调用:

SXSSFWorkbook.dispose();

所以你应该这样写:

SXSSFWorkbook wb = new SXSSFWorkbook(100); // keep 100 rows in memory, exceeding rows will be flushed to disk
// write rows ...
...
// dispose of temporary files backing this workbook on disk
wb.dispose();

关于 SXSSF 限制:

Due to the streaming nature of the implementation, there are the following limitations when compared to XSSF:

  • Only a limited number of rows are accessible at a point in time.

  • Sheet.clone() is not supported.

  • Formula evaluation is not supported

关于您损坏的文件:

According to official SXSSF limitations ,如果您不依赖公式计算,则损坏的 excel 文件的原因可能与 SXSSF 模型无关。

在尝试任何操作之前,您可以更新到最新的稳定 POI 版本。

然后,很难给出具体的指示,但作为一般建议,隔离事物以尝试理解到底发生了什么。
您可以从减少生成的行数并仅处理一些特定的列开始,看看是否能解决问题。
如果不行,你也可以使用默认样式进行测试。

关于java - 将许多记录写入 excel 文件会变得非常慢,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48104804/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com