gpt4 book ai didi

java - 使用 Apache POI 在 word (.docx) 中添加 latex 类型方程

转载 作者:行者123 更新时间:2023-12-02 12:09:54 27 4
gpt4 key购买 nike

我正在尝试使用 Apache POI 创建自动 (.docx) MS Word 文件。 Java程序的输入包含文本、图像和LaTeX风格的方程(嵌入$$或[])。

我的问题是如何在 Word 中添加这个 LaTeX 样式方程,以便当在 MS Word 中编辑 .docx 文件时,它会将方程识别为 MS Word 样式方程(OMML 类型)

注意:我认为应该将 LaTeX 方程转换为 MathML。如果是这样,那么如何将 MathML 添加到 .docx 中?

最佳答案

Microsoft 提供了 XSLT 样式表,用于将 OMML 转换为 MathML (OMML2MML.XSL) 以及使用 XSLT 将 MathML 转换为 OMML (MML2OMML.XSL).

如果您已安装 Microsoft Office,您将在 Office 程序目录中找到这些文件。在我的系统中:

enter image description here

使用它,我们可以使用 XSLT 将 MathML 转换为 OMML。

示例:

import java.io.*;
import org.apache.poi.xwpf.usermodel.*;

import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMath;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMathPara;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTR;

import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamSource;
import javax.xml.transform.stream.StreamResult;

import org.apache.xmlbeans.XmlCursor;

/*
needs the full ooxml-schemas-*.jar or poi-ooxml-full-5.0.0.jar as mentioned in https://poi.apache.org/faq.html#faq-N10025
*/

public class CreateWordFormulaFromMathML {

static File stylesheet = new File("MML2OMML.XSL");
static TransformerFactory tFactory = TransformerFactory.newInstance();
static StreamSource stylesource = new StreamSource(stylesheet);

static CTOMath getOMML(String mathML) throws Exception {
Transformer transformer = tFactory.newTransformer(stylesource);

StringReader stringreader = new StringReader(mathML);
StreamSource source = new StreamSource(stringreader);

StringWriter stringwriter = new StringWriter();
StreamResult result = new StreamResult(stringwriter);
transformer.transform(source, result);

String ooML = stringwriter.toString();
stringwriter.close();

CTOMathPara ctOMathPara = CTOMathPara.Factory.parse(ooML);
CTOMath ctOMath = ctOMathPara.getOMathArray(0);

//for making this to work with Office 2007 Word also, special font settings are necessary
XmlCursor xmlcursor = ctOMath.newCursor();
while (xmlcursor.hasNextToken()) {
XmlCursor.TokenType tokentype = xmlcursor.toNextToken();
if (tokentype.isStart()) {
if (xmlcursor.getObject() instanceof CTR) {
CTR cTR = (CTR)xmlcursor.getObject();
cTR.addNewRPr2().addNewRFonts().setAscii("Cambria Math");
cTR.getRPr2().getRFonts().setHAnsi("Cambria Math"); // up to apache poi 4.1.2
//cTR.getRPr2().getRFontsArray(0).setHAnsi("Cambria Math"); // since apache poi 5.0.0
}
}
}

return ctOMath;
}

public static void main(String[] args) throws Exception {

XWPFDocument document = new XWPFDocument();

XWPFParagraph paragraph = document.createParagraph();
XWPFRun run = paragraph.createRun();
run.setText("The Pythagorean theorem: ");

String mathML =
"<math xmlns=\"http://www.w3.org/1998/Math/MathML\">"
+"<mrow>"
+"<msup><mi>a</mi><mn>2</mn></msup><mo>+</mo><msup><mi>b</mi><mn>2</mn></msup><mo>=</mo><msup><mi>c</mi><mn>2</mn></msup>"
+"</mrow>"
+"</math>";

CTOMath ctOMath = getOMML(mathML);
System.out.println(ctOMath);

CTP ctp = paragraph.getCTP();
ctp.setOMathArray(new CTOMath[]{ctOMath});

paragraph = document.createParagraph();
run = paragraph.createRun();
run.setText("The Quadratic Formula: ");

mathML =
"<math xmlns=\"http://www.w3.org/1998/Math/MathML\">"
+"<mrow>"
+"<mi>x</mi><mo>=</mo><mfrac><mrow><mrow><mo>-</mo><mi>b</mi></mrow><mo>±</mo><msqrt><mrow><msup><mi>b</mi><mn>2</mn></msup><mo>-</mo><mrow><mn>4</mn><mo>⁢</mo><mi>a</mi><mo>⁢</mo><mi>c</mi></mrow></mrow></msqrt></mrow><mrow><mn>2</mn><mo>⁢</mo><mi>a</mi></mrow></mfrac>"
+"</mrow>"
+"</math>";

ctOMath = getOMML(mathML);
System.out.println(ctOMath);

ctp = paragraph.getCTP();
ctp.setOMathArray(new CTOMath[]{ctOMath});

FileOutputStream out = new FileOutputStream("CreateWordFormulaFromMathML.docx");
document.write(out);
out.close();
document.close();

}
}

请注意,此代码需要完整的 ooxml-schemas-*.jarpoi-ooxml-full-5.0.0.jar,如 https://poi.apache.org/faq.html#faq-N10025 中所述。 .

<小时/>

当然有 Java 库可用于将 LaTeX 转换为 MathML。例如:http://www.fmath.info/java/download.jsp .

已下载:fmath-mathml-java-test-project-b1124.zip 并具有 /lib/fmath-mathml-java.jar/lib/jdom-2.0.6.jar 在类路径中,以下工作:

import java.io.*;
import org.apache.poi.xwpf.usermodel.*;

import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMath;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMathPara;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTR;

import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamSource;
import javax.xml.transform.stream.StreamResult;

import org.apache.xmlbeans.XmlCursor;

/*
needs the full ooxml-schemas-1.3.jar as mentioned in https://poi.apache.org/faq.html#faq-N10025
*/

public class CreateWordFormulaFromLaTeX {

static File stylesheet = new File("MML2OMML.XSL");
static TransformerFactory tFactory = TransformerFactory.newInstance();
static StreamSource stylesource = new StreamSource(stylesheet);

static CTOMath getOMML(String mathML) throws Exception {
Transformer transformer = tFactory.newTransformer(stylesource);

StringReader stringreader = new StringReader(mathML);
StreamSource source = new StreamSource(stringreader);

StringWriter stringwriter = new StringWriter();
StreamResult result = new StreamResult(stringwriter);
transformer.transform(source, result);

String ooML = stringwriter.toString();
stringwriter.close();

CTOMathPara ctOMathPara = CTOMathPara.Factory.parse(ooML);
CTOMath ctOMath = ctOMathPara.getOMathArray(0);

//for making this to work with Office 2007 Word also, special font settings are necessary
XmlCursor xmlcursor = ctOMath.newCursor();
while (xmlcursor.hasNextToken()) {
XmlCursor.TokenType tokentype = xmlcursor.toNextToken();
if (tokentype.isStart()) {
if (xmlcursor.getObject() instanceof CTR) {
CTR cTR = (CTR)xmlcursor.getObject();
cTR.addNewRPr2().addNewRFonts().setAscii("Cambria Math");
cTR.getRPr2().getRFonts().setHAnsi("Cambria Math");
}
}
}

return ctOMath;
}

public static void main(String[] args) throws Exception {

XWPFDocument document = new XWPFDocument();

XWPFParagraph paragraph = document.createParagraph();
XWPFRun run = paragraph.createRun();
run.setText("The Pythagorean theorem: ");

String latex = "$a^2 + b^2 = c^2$";

String mathML = fmath.conversion.ConvertFromLatexToMathML.convertToMathML(latex);
mathML = mathML.replaceFirst("<math ", "<math xmlns=\"http://www.w3.org/1998/Math/MathML\" ");
System.out.println(mathML);

CTOMath ctOMath = getOMML(mathML);
System.out.println(ctOMath);

CTP ctp = paragraph.getCTP();
ctp.setOMathArray(new CTOMath[]{ctOMath});


paragraph = document.createParagraph();
run = paragraph.createRun();
run.setText("The Quadratic Formula: ");

latex = "$x=\\frac{-b\\pm\\sqrt{b^2-4ac}}{2a}$";

mathML = fmath.conversion.ConvertFromLatexToMathML.convertToMathML(latex);
mathML = mathML.replaceFirst("<math ", "<math xmlns=\"http://www.w3.org/1998/Math/MathML\" ");
mathML = mathML.replaceAll("&plusmn;", "±");
System.out.println(mathML);

ctOMath = getOMML(mathML);
System.out.println(ctOMath);

ctp = paragraph.getCTP();
ctp.setOMathArray(new CTOMath[]{ctOMath});

document.write(new FileOutputStream("CreateWordFormulaFromLaTeX.docx"));
document.close();

}
}

但是每次转换都可能存在错误。因此 LaTeX -> MathML -> OMML 比仅 MathML -> OMML 更容易出错。

在这种情况下,fmath.conversion.ConvertFromLatexToMathML.convertToMathML 会生成没有 namespace 的 Math XML。但由于 XSLT 需要这个,因此必须手动添加。

并且 fmath.conversion.ConvertFromLatexToMathML.convertToMathML 使用 HTML 实体,而 MML2OMML.XSL 不知道。因此示例中的“±”必须替换为“±”。

<小时/>

也许SnuggleTeX会是更好的图书馆吗?

下载它并在类路径中添加 snuggletex-core-1.2.2.jar ,我上一个示例中的以下代码更改有效:

...
String latex = "$a^2 + b^2 = c^2$";

uk.ac.ed.ph.snuggletex.SnuggleEngine engine = new uk.ac.ed.ph.snuggletex.SnuggleEngine();
uk.ac.ed.ph.snuggletex.SnuggleSession session = engine.createSession();
uk.ac.ed.ph.snuggletex.SnuggleInput input = new uk.ac.ed.ph.snuggletex.SnuggleInput(latex);
session.parseInput(input);
String mathML = session.buildXMLString();
System.out.println(mathML);

/*
String mathML = fmath.conversion.ConvertFromLatexToMathML.convertToMathML(latex);
mathML = mathML.replaceFirst("<math ", "<math xmlns=\"http://www.w3.org/1998/Math/MathML\" ");
System.out.println(mathML);
*/

CTOMath ctOMath = getOMML(mathML);
System.out.println(ctOMath);

...

latex = "$x=\\frac{-b\\pm\\sqrt{b^2-4ac}}{2a}$";

engine = new uk.ac.ed.ph.snuggletex.SnuggleEngine();
session = engine.createSession();
input = new uk.ac.ed.ph.snuggletex.SnuggleInput(latex);
session.parseInput(input);
mathML = session.buildXMLString();
System.out.println(mathML);

/*
mathML = fmath.conversion.ConvertFromLatexToMathML.convertToMathML(latex);
mathML = mathML.replaceFirst("<math ", "<math xmlns=\"http://www.w3.org/1998/Math/MathML\" ");
mathML = mathML.replaceAll("&plusmn;", "±");
System.out.println(mathML);
*/

ctOMath = getOMML(mathML);
System.out.println(ctOMath);
...

无需手动干预。至少不使用给定的 LaTeX 示例。

关于java - 使用 Apache POI 在 word (.docx) 中添加 latex 类型方程,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46623554/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com