gpt4 book ai didi

java - 有没有一种简单的方法可以将 Excel 内联字符串转换为 Java 中的共享字符串表?

转载 作者:行者123 更新时间:2023-12-01 17:53:00 25 4
gpt4 key购买 nike

我正在尝试创建一个简单的 Java 程序,将 Excel 文件从内联字符串转换为共享字符串表,以减小文件大小。

我知道 Apache POI 有一个 SXSSFWorkbook类可以完成这项工作,但使用 SAX XML 解析器读取带有内联字符串的大型 xlsx 文件仍然可能会崩溃。例如 150,000 行 x 50 列单元格。

是否有一个不使用 Apache POI 库来完成简单工作的简单解决方案?有人知道吗?

最佳答案

尽管首先创建其中包含内联字符串的工作表,然后用共享字符串替换这些内联字符串的效率非常低,但我将提供如何完成此操作的问题的答案。

需求是:循环遍历工作表的 XML 文件中的所有单元格以获取内联字符串。然后查找 sharedStrings.xml 该字符串是否已经存在。如果是,则获取 ID,否则在 sharedStrings.xml 中创建一个新字符串并获取 ID。然后将 ID 放入工作表 XML 文件的单元格中,而不是内联字符串值。

下面的代码就是这样做的。如果 TestInlineStrings.xlsx 在第一个工作表中有内联字符串,那么在该代码运行后,这些内联字符串将被替换为共享字符串。

import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackagePart;

import org.apache.poi.xssf.model.SharedStringsTable;

import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRst;

import javax.xml.stream.XMLEventFactory;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLEventWriter;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.events.Characters;
import javax.xml.stream.events.StartElement;
import javax.xml.stream.events.EndElement;
import javax.xml.stream.events.Attribute;
import javax.xml.stream.events.XMLEvent;

import javax.xml.namespace.QName;

import java.io.File;
import java.io.InputStream;
import java.io.OutputStream;

import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;
import java.util.Iterator;

class StaxReplaceInlineStrings {

public static void main(String[] args) {
try {

File file = new File("TestInlineStrings.xlsx");
OPCPackage opcpackage = OPCPackage.open(file);

//if there are strings in the sheet data, we need the SharedStringsTable
PackagePart sharedstringstablepart = opcpackage.getPartsByName(Pattern.compile("/xl/sharedStrings.xml")).get(0);
SharedStringsTable sharedstringstable = new SharedStringsTable();
sharedstringstable.readFrom(sharedstringstablepart.getInputStream());

PackagePart sheetpart = opcpackage.getPartsByName(Pattern.compile("/xl/worksheets/sheet1.xml")).get(0);

XMLEventReader reader = XMLInputFactory.newInstance().createXMLEventReader(sheetpart.getInputStream());
XMLEventWriter writer = XMLOutputFactory.newInstance().createXMLEventWriter(sheetpart.getOutputStream());

XMLEventFactory eventFactory = XMLEventFactory.newInstance();


while(reader.hasNext()){ //loop over all XML in sheet1.xml

boolean cellReplaced = false; //marker whether cell having inline string was replaced by cell having shared string

XMLEvent event = (XMLEvent)reader.next();
if(event.isStartElement()){
StartElement startElement = (StartElement)event;
QName startElementName = startElement.getName();
if (startElementName.getLocalPart().equalsIgnoreCase("c")) { //start element of cell
Attribute attribute;
StartElement cellStart = startElement; //remember cell start
Iterator attributeIterator = cellStart.getAttributes(); //get cell's attributes
while (attributeIterator.hasNext()) {
attribute = (Attribute)attributeIterator.next();
if ("t".equals(attribute.getName().getLocalPart())) { //cell has type attribute
String tvalue = attribute.getValue();
if ("inlineStr".equals(tvalue)) { //cell type is inline string
String inlineString = "";
startElement = (StartElement)(XMLEvent)reader.next(); //read next start element - error if is not a start element
startElementName = startElement.getName();
if (startElementName.getLocalPart().equalsIgnoreCase("is")) { //start element of inline string
startElement = (StartElement)(XMLEvent)reader.next(); //read next start element - error if is not a start element
startElementName = startElement.getName();
if (startElementName.getLocalPart().equalsIgnoreCase("t")) { //start element of text
Characters characters = (Characters)(XMLEvent)reader.next(); //read next characters element - error if is not a characters element
inlineString = characters.getData(); //get text data
System.out.println(inlineString);
}
}

//create shared string in shared strings table
CTRst ctstr = CTRst.Factory.newInstance();
ctstr.setT(inlineString);
int sRef = sharedstringstable.addEntry(ctstr);

//we are replacing the cell element so skip elements until end element of cell
while(reader.hasNext()) {
event = (XMLEvent)reader.next();
if(event.isEndElement()){
EndElement endElement = (EndElement)event;
QName endElementName = endElement.getName();
if (endElementName.getLocalPart().equalsIgnoreCase("c")) { //end element of cell
break;
}
}
}

//create the new cell element having the shared string
Attribute r = cellStart.getAttributeByName(new QName("r"));
Attribute s = cellStart.getAttributeByName(new QName("s"));
Attribute t = eventFactory.createAttribute("t", "s");
List attributeList = Arrays.asList(new Attribute[]{t});
if (r != null && s != null) {
attributeList = Arrays.asList(new Attribute[]{r, s, t});
} else if (r != null) {
attributeList = Arrays.asList(new Attribute[]{r, t});
} else if (s != null) {
attributeList = Arrays.asList(new Attribute[]{s, t});
}
System.out.println(attributeList);
StartElement newCellStart = eventFactory.createStartElement(new QName("c"), attributeList.iterator(), null);
writer.add(newCellStart);
StartElement newCellValue = eventFactory.createStartElement(new QName("v"), null, null);
writer.add(newCellValue);
Characters value = eventFactory.createCharacters(Integer.toString(sRef));
writer.add(value);
EndElement newCellValueEnd = eventFactory.createEndElement(new QName("v"), null);
writer.add(newCellValueEnd);
EndElement newCellEnd = eventFactory.createEndElement(new QName("c"), null);
writer.add(newCellEnd);

cellReplaced = true; // mark that cell was replaced
break;
}
}
}
}
}
if (!cellReplaced) {
writer.add(event); //by default write each read event, except cell was replaced
}
}
writer.flush();

//write the SharedStringsTable
OutputStream out = sharedstringstablepart.getOutputStream();
sharedstringstable.writeTo(out);
out.close();

opcpackage.close();

} catch (Exception ex) {
ex.printStackTrace();
}
}
}

关于java - 有没有一种简单的方法可以将 Excel 内联字符串转换为 Java 中的共享字符串表?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47899750/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com