OpenXML - 将书签应用于 Word 文档中的段落-6ren

OpenXML - 将书签应用于 Word 文档中的段落

转载作者：行者123 更新时间：2023-12-03 17:32:01

下面的代码使用 OPENXML (asp.net) 工作正常，并使用 HEADING2 在 word 文档中打印我们的元素......我们如何将书签应用于特定段落..

我们正在尝试的是提取两个 HEADING 之间的部分......我们想知道如何应用书签以及我们如何使用两个书签之间的提取文本......

const string fileName = @"D:\DocFiles\Scan.docx";
const string documentRelationshipType = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument";
const string stylesRelationshipType = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles";
const string wordmlNamespace = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
XNamespace w = wordmlNamespace;
XDocument xDoc = null;
XDocument styleDoc = null;

using (Package wdPackage = Package.Open(fileName, FileMode.Open, FileAccess.Read))
{
    PackageRelationship docPackageRelationship =
      wdPackage
      .GetRelationshipsByType(documentRelationshipType)
      .FirstOrDefault();
    if (docPackageRelationship != null)
    {
        Uri documentUri =
            PackUriHelper
            .ResolvePartUri(
               new Uri("/", UriKind.Relative),
                     docPackageRelationship.TargetUri);
        PackagePart documentPart =
            wdPackage.GetPart(documentUri);

        //  Load the document XML in the part into an XDocument instance.  
        xDoc = XDocument.Load(XmlReader.Create(documentPart.GetStream()));

        //  Find the styles part. There will only be one.  
        PackageRelationship styleRelation =
          documentPart.GetRelationshipsByType(stylesRelationshipType)
          .FirstOrDefault();
        if (styleRelation != null)
        {
            Uri styleUri = PackUriHelper.ResolvePartUri(documentUri, styleRelation.TargetUri);
            PackagePart stylePart = wdPackage.GetPart(styleUri);

            //  Load the style XML in the part into an XDocument instance.  
            styleDoc = XDocument.Load(XmlReader.Create(stylePart.GetStream()));
        }
    }
}

string defaultStyle =
    (string)(
        from style in styleDoc.Root.Elements(w + "style")
        where (string)style.Attribute(w + "type") == "paragraph" &&
              (string)style.Attribute(w + "default") == "1"
        select style
    ).First().Attribute(w + "styleId");

// Find all paragraphs in the document.  
var paragraphs =
    from para in xDoc
                 .Root
                 .Element(w + "body")
                 .Descendants(w + "p")
    let styleNode = para
                    .Elements(w + "pPr")
                    .Elements(w + "pStyle")
                    .FirstOrDefault()
    select new
    {
        ParagraphNode = para,
        StyleName = styleNode != null ?
            (string)styleNode.Attribute(w + "val") :
            defaultStyle
    };

// Retrieve the text of each paragraph.  
var paraWithText =
    from para in paragraphs
    select new
    {
        ParagraphNode = para.ParagraphNode,
        StyleName = para.StyleName,
        Text = ParagraphText(para.ParagraphNode)
    };

foreach (var p in paraWithText)
{
    if (p.StyleName=="Heading2")
    {
        Response.Write(p.StyleName + " -" + p.Text);
        Response.Write("</br>");
    }
}

最佳答案

这是一个样本 Bookmark我创建的类是为了演示如何处理书签。它找到成对的 w:bookmarkStart和 w:bookmarkEnd元素并展示如何获得 w:r这两个标记之间的元素。基于此，您可以处理文本，例如，如 GetValue() 所示方法。

using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;
using OpenXmlPowerTools;

namespace CodeSnippets.OpenXml.Wordprocessing
{
    /// <summary>
    /// Represents a corresponding pair of w:bookmarkStart and w:bookmarkEnd elements.
    /// </summary>
    public class Bookmark
    {
        private Bookmark(XElement root, string bookmarkName)
        {
            Root = root;

            BookmarkStart = new XElement(W.bookmarkStart,
                new XAttribute(W.id, -1),
                new XAttribute(W.name, bookmarkName));

            BookmarkEnd = new XElement(W.bookmarkEnd,
                new XAttribute(W.id, -1));
        }

        private Bookmark(XElement root, XElement bookmarkStart, XElement bookmarkEnd)
        {
            Root = root;
            BookmarkStart = bookmarkStart;
            BookmarkEnd = bookmarkEnd;
        }

        /// <summary>
        /// The root element containing both <see cref="BookmarkStart"/> and
        /// <see cref="BookmarkEnd"/>.
        /// </summary>
        public XElement Root { get; }

        /// <summary>
        /// The w:bookmarkStart element.
        /// </summary>
        public XElement BookmarkStart { get; }

        /// <summary>
        /// The w:bookmarkEnd element.
        /// </summary>
        public XElement BookmarkEnd { get; }

        /// <summary>
        /// Finds a pair of w:bookmarkStart and w:bookmarkEnd elements in the given
        /// <paramref name="root"/> element, where the w:name attribute value of the
        /// w:bookmarkStart element is equal to <paramref name="bookmarkName"/>.
        /// </summary>
        /// <param name="root">The root <see cref="XElement"/>.</param>
        /// <param name="bookmarkName">The bookmark name.</param>
        /// <returns>A new <see cref="Bookmark"/> instance representing the bookmark.</returns>
        public static Bookmark Find(XElement root, string bookmarkName)
        {
            XElement bookmarkStart = root
                .Descendants(W.bookmarkStart)
                .FirstOrDefault(e => (string) e.Attribute(W.name) == bookmarkName);

            string id = bookmarkStart?.Attribute(W.id)?.Value;
            if (id == null) return new Bookmark(root, bookmarkName);

            XElement bookmarkEnd = root
                .Descendants(W.bookmarkEnd)
                .FirstOrDefault(e => (string) e.Attribute(W.id) == id);

            return bookmarkEnd != null
                ? new Bookmark(root, bookmarkStart, bookmarkEnd)
                : new Bookmark(root, bookmarkName);
        }

        /// <summary>
        /// Gets all w:r elements between the bookmark's w:bookmarkStart and
        /// w:bookmarkEnd elements.
        /// </summary>
        /// <returns>A collection of w:r elements.</returns>
        public IEnumerable<XElement> GetRuns()
        {
            return Root
                .Descendants()
                .SkipWhile(d => d != BookmarkStart)
                .Skip(1)
                .TakeWhile(d => d != BookmarkEnd)
                .Where(d => d.Name == W.r);
        }

        /// <summary>
        /// Gets the concatenated inner text of all runs between the bookmark's
        /// w:bookmarkStart and w:bookmarkEnd elements, ignoring paragraph marks
        /// and page breaks.
        /// </summary>
        /// <remarks>
        /// The output of this method can be compared to the output of the
        /// <see cref="XElement.Value"/> property.
        /// </remarks>
        /// <returns>The concatenated inner text.</returns>
        public string GetValue()
        {
            return GetRuns().Select(UnicodeMapper.RunToString).StringConcatenate();
        }
    }
}

上面的类处理如下文档(非常简单的测试文档):

<?xml version="1.0" encoding="utf-8"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:body>
    <w:p>
      <w:r>
        <w:t>First</w:t>
      </w:r>
    </w:p>
    <w:bookmarkStart w:id="1" w:name="_Bm001" />
    <w:p>
      <w:r>
        <w:t>Second</w:t>
      </w:r>
    </w:p>
    <w:p>
      <w:r>
        <w:t>Third</w:t>
      </w:r>
    </w:p>
    <w:bookmarkEnd w:id="1" />
    <w:p>
      <w:r>
        <w:t>Fourth</w:t>
      </w:r>
    </w:p>
  </w:body>
</w:document>

以上文档由以下单元测试创建，这些单元测试演示了如何使用 Bookmark类(class):

using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Xml.Linq;
using CodeSnippets.OpenXml.Wordprocessing;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using OpenXmlPowerTools;
using Xunit;

namespace CodeSnippets.Tests.OpenXml.Wordprocessing
{
    public class BookmarkTests
    {
        /// <summary>
        /// The w:name value of our bookmark.
        /// </summary>
        private const string BookmarkName = "_Bm001";

        /// <summary>
        /// The w:id value of our bookmark.
        /// </summary>
        private const int BookmarkId = 1;

        /// <summary>
        /// The test w:document with our bookmark, which encloses the two runs
        /// with inner texts "Second" and "Third".
        /// </summary>
        private static readonly XElement Document =
            new XElement(W.document,
                new XAttribute(XNamespace.Xmlns + "w", W.w.NamespaceName),
                new XElement(W.body,
                    new XElement(W.p,
                        new XElement(W.r,
                            new XElement(W.t, "First"))),
                    new XElement(W.bookmarkStart,
                        new XAttribute(W.id, BookmarkId),
                        new XAttribute(W.name, BookmarkName)),
                    new XElement(W.p,
                        new XElement(W.r,
                            new XElement(W.t, "Second"))),
                    new XElement(W.p,
                        new XElement(W.r,
                            new XElement(W.t, "Third"))),
                    new XElement(W.bookmarkEnd,
                        new XAttribute(W.id, BookmarkId)),
                    new XElement(W.p,
                        new XElement(W.r,
                            new XElement(W.t, "Fourth")))
                )
            );

        /// <summary>
        /// Creates a <see cref="WordprocessingDocument"/> for on a <see cref="MemoryStream"/>
        /// testing purposes, using the given <paramref name="document"/> as the w:document
        /// root element of the main document part.
        /// </summary>
        /// <param name="document">The w:document root element.</param>
        /// <returns>The <see cref="MemoryStream"/> containing the <see cref="WordprocessingDocument"/>.</returns>
        private static MemoryStream CreateWordprocessingDocument(XElement document)
        {
            var stream = new MemoryStream();
            const WordprocessingDocumentType type = WordprocessingDocumentType.Document;

            using (WordprocessingDocument wordDocument = WordprocessingDocument.Create(stream, type))
            {
                MainDocumentPart part = wordDocument.AddMainDocumentPart();
                part.PutXDocument(new XDocument(document));
            }

            return stream;
        }

        [Fact]
        public void GetRuns_WordprocessingDocumentWithBookmarks_CorrectRunsReturned()
        {
            // Arrange.
            // Create a new Word document on a Stream, using the test w:document
            // as the main document part.
            Stream stream = CreateWordprocessingDocument(Document);

            // Open the WordprocessingDocument on the Stream, using the Open XML SDK.
            using WordprocessingDocument wordDocument = WordprocessingDocument.Open(stream, true);

            // Get the w:document element from the main document part and find
            // our bookmark.
            XElement document = wordDocument.MainDocumentPart.GetXElement();
            Bookmark bookmark = Bookmark.Find(document, BookmarkName);

            // Act, getting the bookmarked runs.
            IEnumerable<XElement> runs = bookmark.GetRuns();

            // Assert.
            Assert.Equal(new[] {"Second", "Third"}, runs.Select(run => run.Value));
        }

        [Fact]
        public void GetText_WordprocessingDocumentWithBookmarks_CorrectRunsReturned()
        {
            // Arrange.
            // Create a new Word document on a Stream, using the test w:document
            // as the main document part.
            Stream stream = CreateWordprocessingDocument(Document);

            // Open the WordprocessingDocument on the Stream, using the Open XML SDK.
            using WordprocessingDocument wordDocument = WordprocessingDocument.Open(stream, true);

            // Get the w:document element from the main document part and find
            // our bookmark.
            XElement document = wordDocument.MainDocumentPart.GetXElement();
            Bookmark bookmark = Bookmark.Find(document, BookmarkName);

            // Act, getting the concatenated text contents of the bookmarked runs.
            string text = bookmark.GetValue();

            // Assert.
            Assert.Equal("SecondThird", text);
        }
    }
}

您可以在我的 CodeSnippets 中找到完整的代码示例GitHub 存储库。寻找 Bookmark和 BookmarkTests类并注意我正在使用 Open-Xml-PowerTools .

显然，您可以使用这些 Open XML 元素做更复杂的事情。这只是一个简单的例子。

关于OpenXML - 将书签应用于 Word 文档中的段落，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52202400/

文章推荐： google-analytics - 将自定义指标发送到谷歌分析

文章推荐： vue.js - Vuepress 内部并与 Vue 项目集成

文章推荐： css - webpack 不加载 @font-face

javascript - 如何使用鼠标移动网页上的页面部分(段落)？
我的网页上显示了一份简历。其中包含部分(段落)，例如教育、经验、项目等，这里是客户想要通过在网页的段落(节)上拖动鼠标来移动页面上的这些节。我怎样才能实现这个功能。我正在使用 ruby on R
css - float 段落
我有一个特定大小的 div，它是图像和两个段落。都设置了向左浮动 div { width: 400px; height: 400px; } img { float: left; wi
完美对齐的 HTML 段落
我想完美对齐一段，使整个段落位于页面中央，但左右两边完美对齐。这是一个完美对齐的段落的图片示例: 该段落看起来像是在某种盒子中，左右两边完全笔直。我如何在 css 或 html 中执行此操作？最佳答
javascript - 按行而非字符 chop 段落
我的 div 中有多个带有段落的项目，我想将它们 chop 为 2 行。我尝试使用高度进行 chop ，但结果会导致单词被 chop 。我无法使用字符，因为在某些情况下单词很长并且会被推到新行。我正
c# - 什么是匹配字符串(段落)中大型短语词典中项目的有效方法
有没有办法通过 .Net 框架(或有人写过类似的东西)在传递字符串和字典对象时获取匹配数组？首先是一些背景我需要我有运动队的 csv 文件，我将其加载到字典对象中，例如... Team, Var
java - 用java创建一个计算句子、段落、字母和单词的程序
我需要创建一个程序来计算文本文件中字符的频率以及段落、单词和句子的数量。我有一个问题，当我的程序输出字母的频率时，程序会为字母表中的每个字母输出多个输出。输出应该是这样的: 如果输入是“hello
java - 段落 View 首选项更改
我的 Swing 应用程序中有一个 JTextPane，其上方有一个 JSlider。当我拖动 slider 时，我希望当前具有插入符号的 JTextPane 段落减少/增加其宽度(并相应地调整高度)
c# - 什么是匹配字符串(段落)中大型短语词典中项目的有效方法
有没有办法通过 .Net 框架(或有人写过类似的东西)在传递字符串和字典对象时获取匹配数组？首先是一些背景我需要我有运动队的 csv 文件，我将其加载到字典对象中，例如... Team, Var
Perl 段落 n 元语法
假设我有一个文本句子: $body = 'the quick brown fox jumps over the lazy dog'; 我想将该句子放入“关键字”的散列中，但我想允许多单词关键字；我有以
java - 用匹配器解析协议(protocol)段落？
我尝试编写一个服务器-客户端程序。我可以发送协议(protocol)文本并正确获取文本。但是当我尝试解析文本时，我遇到了 Matcher 类的问题。因为它只匹配第一行。那么我怎样才能找到正确的字符串并
Jquery:删除某些 WordPress 段落
由于 WordPress 在所有内容上都添加了段落标签，因此我需要在某些条件下删除段落标签。在这种情况下，我希望它们从图像中消失。我让那部分工作了: $(".scroller img").un
python - 匹配多个包含括号内文本的完整 HTML 段落
我需要匹配包含三个大括号之间的文本的完整 HTML 段落。这是我输入的 HTML: {{{Lorem ipsum dolor sit amet. Ut enim ad minim veniam. D
javascript - 用javascript包装大 Markdown 段落
我正在尝试查找大段落(超过一定数量的字符)并将其包装到一个范围内。目前我正在这样做: output.replace(/(\n{2}|^)([^\n{2}]{500,})(\n{2}|$)/mg, '$
javascript - 尝试根据选择显示不同的 HTML 段落
所以我有这个模式，它应该提供不同的描述性段落，具体取决于用户从下拉列表中做出的选择。目前它只始终显示所有段落。我希望它在选择“公共(public)”时显示“隐藏”，在选择“内部”时显示“隐藏2”。等等
javascript - 如何将文本区域中粘贴的文本分成由相同字符数组成的
段落？
JSFiddle Link 我正在使用的 JSFiddle 似乎正是我的元素所需要的。但是，我将如何更改此当前代码以确保每个分段的段落包含相同数量的字符并且所有段落的宽度相同？任何帮助将不胜感激，尤
CSS - 段落 - 我需要设置字体大小 : inherit?
我希望我所有的 p 标签继承正文的字体大小——如果我没有在它们上声明字体大小或将它们嵌套在带有字体的父项中，它们会自动执行——尺寸声明。但是我应该在 CSS 中的 p 中添加 font-size:
php - 段落
警告框作为回显？
Achtung! This alert box indicates a dangerous or potentially negative action.× 所以我创建了自己的警告框，但问
文本框中的 Python docx 段落
有什么方法可以使用 python-docx 访问和操作文本框中现有 docx 文档中的文本？我试图通过迭代在文档的所有段落中找到关键字: doc = Document('test.docx') fo
algorithm - 在字符串缓冲区/段落/文本中查找单词
这是在亚马逊电话采访中被问到的——“你能写一个程序(用你喜欢的语言 C/C++/等)在一个大的字符串缓冲区中找到一个给定的词吗？即数字出现次数“ 我仍在寻找我应该给面试官的完美答案。我试着写一个线性搜
php - 如何在多行(段落)中书写文本？
当我使用这段代码时，我可以用文本制作图像，但在一行中， function writetext($image_path,$imgdestpath,$x,$y,$angle,$text,$font,$fo

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城