gpt4 book ai didi

C# OpenXml 获取DOCX WordStyle 属性简码

转载 作者:行者123 更新时间:2023-11-30 20:46:39 26 4
gpt4 key购买 nike

只是好奇是否有一个更简化的版本来检查给定的正文是否应用了“Heading3”的字样,因为我在学习 OpenXML 库时编写了这个示例 C# 代码。明确地说,我只是问给定一个 body 元素,如何确定给定的 body 元素是否应用了哪种文字样式。我最终不得不编写一个程序来处理大量 .DOCX 文件,并且需要从上到下处理它们。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using System.IO;

namespace docxparsing
{
class Program
{
static void Main()
{
string file_to_parse = @"C:\temp\sample.docx";

WordprocessingDocument doc = WordprocessingDocument.Open(file_to_parse,false);

Body body = doc.MainDocumentPart.Document.Body;

string fooStr
foreach( var foo in body )
{
fooStr = foo.InnerXml;

/*
these 2 comments represent 2 different xml snippets from 'fooStr'. the only way i figure out how to get the word style is by reading
this xml and doing checks for values. i don't know of any other approach in using the body element to check for the applied word style

<w:pPr xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:pStyle w:val="Heading2" />
<w:pPr xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:pStyle w:val="Heading3" />
*/

bool hasHeading3 = fooStr.Contains("pStyle w:val=\"Heading3\"");

if ( hasHeading3 )
{
Console.WriteLine("heading3 found");
}
}

doc.Close();
}
}

//------------------------------------------ ----------------------------------

编辑

这是执行此操作的一种方法的更新代码。总体上仍然不满意,但它确实有效。要查看的函数是 getWordStyleValue(string x)

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using System;
using System.Diagnostics;
using System.IO;
using System.Text;


namespace docxparsing
{
class Program
{
// ************************************************
// grab the word style value
// ************************************************
static string getWordStyleValue(string x)
{
int p = 0;
p = x.IndexOf("w:pStyle w:val=");
if ( p == -1 )
{
return "";
}
p = p + 15;

StringBuilder sb = new StringBuilder();
while (true)
{
p++;
char c = x[p];
if (c != '"')
{
sb.Append(c);
}
else
{
break;
}
}

string s = sb.ToString();
return s;
}


// ************************************************
// Main
// ************************************************
static void Main(string[] args)
{
string theFile = @"C:\temp\sample.docx";
WordprocessingDocument doc = WordprocessingDocument.Open(theFile,false);

string body_table = "DocumentFormat.OpenXml.Wordprocessing.Table";
string body_paragraph = "DocumentFormat.OpenXml.Wordprocessing.Paragraph";

Body body = doc.MainDocumentPart.Document.Body;
StreamWriter sw1 = new StreamWriter("paragraphs.log");

foreach (var b in body)
{
string body_type = b.ToString();

if (body_type == body_paragraph)
{
string str = getWordStyleValue(b.InnerXml);

if (str == "" || str == "HeadingNon-TOC" || str == "TOC1" || str == "TOC2" || str == "TableofFigures" || str == "AcronymList" )
{
continue;
}

sw1.WriteLine(str + "," + b.InnerText);
}

if ( body_type == body_table )
{
// sw1.WriteLine("Table:\n{0}",b.InnerText);
}
}

doc.Close();
sw1.Close();
}
}
}

最佳答案

是的。你可以这样做:

bool ContainsHeading3 = body.Descendants<ParagraphSytleId>().Any(psId => psId.Val == "Heading3");

这将查看所有 ParagraphStyleId元素(在 xml 中为 w:pStyle)并查看它们是否任何具有 ValHeading3 .

关于C# OpenXml 获取DOCX WordStyle 属性简码,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26593445/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com