gpt4 book ai didi

c# - 如何在 C# 中借助 memoryStream 将带有图像和方程式的 ms word 文件转换为 html

转载 作者:太空宇宙 更新时间:2023-11-03 15:36:00 25 4
gpt4 key购买 nike

我正在使用如下编码并且工作正常。这些程序将 word 文件转换为带有图像的 html 文件。

方程式有问题。我无法转换 ms word 文件方程式 HTML。

有人可以帮忙吗?

FileUpload1.SaveAs(Server.MapPath(FileUpload1.FileName));

string imageDirectoryName = FileUpload1.FileName + "_files";
DirectoryInfo dirInfo = new DirectoryInfo(Server.MapPath(imageDirectoryName));

if (dirInfo.Exists)
{
// Delete the directory and files.
foreach (var f in dirInfo.GetFiles())
f.Delete();
dirInfo.Delete();
}

int imageCounter = 0;

byte[] byteArray = File.ReadAllBytes(sourceDocumentFileName);

using (MemoryStream memoryStream = new MemoryStream())
{
memoryStream.Write(byteArray, 0, byteArray.Length);
using (WordprocessingDocument doc =
WordprocessingDocument.Open(memoryStream, true))
{
HtmlConverterSettings settings = new HtmlConverterSettings()
{
//PageTitle = "Test Title",
//ConvertFormatting = false,
};
XElement html = HtmlConverter.ConvertToHtml(doc, settings,
imageInfo =>
{
DirectoryInfo localDirInfo = new DirectoryInfo(Server.MapPath(imageDirectoryName));
if (!localDirInfo.Exists)
localDirInfo.Create();
++imageCounter;
string extension = imageInfo.ContentType.Split('/')[1].ToLower();
ImageFormat imageFormat = null;
if (extension == "png")
{
// Convert the .png file to a .jpeg file.
extension = "jpeg";
imageFormat = ImageFormat.Jpeg;
}
else if (extension == "bmp")
imageFormat = ImageFormat.Bmp;
else if (extension == "jpeg")
imageFormat = ImageFormat.Jpeg;
else if (extension == "tiff")
imageFormat = ImageFormat.Tiff;
else if (extension == "wmf")
imageFormat = ImageFormat.Jpeg;
else if (extension == "png")
imageFormat = ImageFormat.Png;


// If the image format is not one that you expect, ignore it,
// and do not return markup for the link.
if (imageFormat == null)
return null;

string imageFileName = imageDirectoryName + "/image" +
imageCounter.ToString() + "." + extension;
try
{
imageInfo.Bitmap.Save(Server.MapPath(imageFileName), imageFormat);
}
catch (System.Runtime.InteropServices.ExternalException)
{
return null;
}
XElement img = new XElement(Xhtml.img,
new XAttribute(NoNamespace.src, imageFileName),
imageInfo.ImgStyleAttribute,
imageInfo.AltText != null ?
new XAttribute(NoNamespace.alt, imageInfo.AltText) : null);
return img;
});
File.WriteAllText(fileInfo.Directory.FullName + "/" + fileInfo.Name.Substring(0,
fileInfo.Name.Length - fileInfo.Extension.Length) + ".html",
html.ToStringNewLineOnAttributes());
}
}

最佳答案

第 1 步 - 您应该到这里了解如何在 word 文件中获取 Math 对象 here

第 2 步 - 遍历 word 文件的段落并选择其中的 OfficeMath 对象,将其转换为 MathML(参见第 1 步),如果您可以转换为 LaTex想要(我认为在 HTML 中使用 LaTex 会很友好)

注意:在第 1 步中从 MMOL2MML 转换为 LaTex 将类似 see here to get file

第 3 步 - 在第 2 步的对象之前/之后插入内容为 MathML/LaTex 的文本对象(在第 2 步中)。使用此步骤是因为当使用 HtmlConverter.ConvertToHtml 时会丢失 Word 内容中的数学对象,因此当您在对象数学之前/之后插入时,文本将在 HTML 中可用

这是我的代码:

using (WordprocessingDocument doc = WordprocessingDocument.Open(docFilePath, true))
{
foreach (var paragraph in doc.MainDocumentPart.RootElement.Descendants<Paragraph>())
{
foreach (var ele in paragraph.Descendants<DocumentFormat.OpenXml.Math.OfficeMath>())
{
string wordDocXml = ele.OuterXml;

XslCompiledTransform xslTransform = new XslCompiledTransform();
xslTransform.Load(officeMathMLSchemaFilePath);
var result = "";
using (TextReader tr = new StringReader(wordDocXml))
{
// Load the xml of your main document part.
using (XmlReader reader = XmlReader.Create(tr))
{
using (MemoryStream ms = new MemoryStream())
{
XmlWriterSettings settings = xslTransform.OutputSettings.Clone();

// Configure xml writer to omit xml declaration.
settings.ConformanceLevel = ConformanceLevel.Fragment;
settings.OmitXmlDeclaration = true;

XmlWriter xw = XmlWriter.Create(ms, settings);

// Transform our OfficeMathML to MathML.
xslTransform.Transform(reader, xw);
ms.Seek(0, SeekOrigin.Begin);

using (StreamReader sr = new StreamReader(ms, Encoding.UTF8))
{
result = MathML2Latex(sr.ReadToEnd());
officeMLFormulas.Add(result);
}
}
}
}

Run run = new Run();
run.Append(new Text(result));
ele.InsertBeforeSelf(run);
}
}
}

关于c# - 如何在 C# 中借助 memoryStream 将带有图像和方程式的 ms word 文件转换为 html,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31870304/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com