gpt4 book ai didi

excel - 如何使用 XML 解析更可靠地删除 Excel 单元格中的删除线文本?

转载 作者:行者123 更新时间:2023-12-04 19:50:28 27 4
gpt4 key购买 nike

我有一个复杂的电子表格,其中包含许多文本单元格,其中包含普通文本和带删除线的文本的随机混合。在扫描单元格以获取有用信息之前,我必须删除划线的文本。我最初使用 Characters 对象(使用 VBA)实现了这一点,但出于商业目的,它太慢了以至于完全不切实际。然后我得到了一些解析 XML 编码的代码(在这个站点上)。这快了 1000 倍,但有时会导致以下错误:

“参数节点不是该节点的子节点”。

到目前为止,它只发生在负载很重的单元格(1000 个字符)中,否则它工作正常。我看不出问题单元格的代码或 XML 结构有任何问题,尽管我是 XML 的新手。使用 VBA 调试器,我知道错误是在调用 RemoveChild() 时发生的,通常是在它已经在单元格文本的几个删除部分上没有错误地工作时。

有没有办法让下面的代码更健壮?

Public Sub ParseCellForItems(TargetCell As Excel.Range, ItemsInCell() As String)

Dim XMLDocObj As MSXML2.DOMDocument60
Dim x As MSXML2.IXMLDOMNode
Dim s As MSXML2.IXMLDOMNode

Dim CleanedCellText As String

On Error GoTo ErrorHandler

Call UnstrikeLineBreakCharsInCell(TargetCell)

Set XMLDocObj = New MSXML2.DOMDocument60

'Add some namespaces.
XMLDocObj.SetProperty "SelectionNamespaces", "xmlns:ss='urn:schemas-microsoft-com:office:spreadsheet' " & _
"xmlns:ht='http://www.w3.org/TR/REC-html40'"

'Load the cell data as XML into XMLDOcObj.
If XMLDocObj.LoadXML(TargetCell.Value(xlRangeValueXMLSpreadsheet)) Then

Set x = XMLDocObj.SelectSingleNode("//ss:Data") 'Cell content.

If Not x Is Nothing Then

Set s = x.SelectSingleNode("//ht:S") 'Struck through cell content.

Do While Not s Is Nothing
x.RemoveChild s
Set s = x.SelectSingleNode("//ht:S")
Loop

CleanedCellText = XMLDocObj.Text

'Parse CleanedCellText for useful information.'
'...

End If

End If

Set XMLDocObj = Nothing
'Presumably don't have to 'destroy' x and s as well, as they were pointing to elements of XMLObj.

Exit Sub

ErrorHandler:

Call RaiseError(Err.Number, Err.Source, "ParseCellForItems()", Err.Description, Erl)

End Sub

Public Sub UnstrikeLineBreakCharsInCell(TargetCell As Excel.Range)

Dim mc As MatchCollection
Dim RegExObj1 As RegExp
Dim Match As Variant

On Error GoTo ErrorHandler

Set RegExObj1 = New RegExp
RegExObj1.Global = True
RegExObj1.IgnoreCase = True
RegExObj1.Pattern = "\n" 'New line. Equivalent to vbNewLine.

Set mc = RegExObj1.Execute(TargetCell.Value)

For Each Match In mc
TargetCell.Characters(Match.FirstIndex + 1, 1).Font.Strikethrough = False
Next Match

Set mc = Nothing
Set RegExObj1 = Nothing

Exit Sub

ErrorHandler:

Call RaiseError(Err.Number, Err.Source, "UnstrikeLineBreakCharsInCell()", Err.Description, Erl)

End Sub

最佳答案

是的,根据 Tim Williams 的评论,确保从其直接父级调用 RemoveChild() 可以解决问题:

Set s = x.SelectSingleNode("//ht:S")
Do While Not s Is Nothing
s.ParentNode.RemoveChild s
Set s = x.SelectSingleNode("//ht:S")
Loop

关于excel - 如何使用 XML 解析更可靠地删除 Excel 单元格中的删除线文本?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61846271/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com