gpt4 book ai didi

xml - 如何提高 VBA 中 XML 解析的速度

转载 作者:数据小太阳 更新时间:2023-10-29 02:22:49 27 4
gpt4 key购买 nike

我有一个大型 XML 文件需要在 VBA(excel 2003 和 2007)中进行解析。 xml 文件中可能有超过 11,000 行数据,每行有 10 到 20 列。仅解析和获取数据(5 - 7 分钟)就成了一项艰巨的任务。我尝试读取 xml 并将每个“行”放入字典(键 = 行号,值 = 行属性),但这需要同样长的时间。

遍历 DOM 需要很长时间。有没有更有效的方法?

Dim XMLDict
Sub ParseXML(ByRef RootNode As IXMLDOMNode)
Dim Counter As Long
Dim RowList As IXMLDOMNodeList
Dim ColumnList As IXMLDOMNodeList
Dim RowNode As IXMLDOMNode
Dim ColumnNode As IXMLDOMNode
Counter = 1
Set RowList = RootNode.SelectNodes("Row")

For Each RowNode In RowList
Set ColumnList = RowNode.SelectNodes("Col")
Dim NodeValues As String
For Each ColumnNode In ColumnList
NodeValues = NodeValues & "|" & ColumnNode.Attributes.getNamedItem("id").Text & ":" & ColumnNode.Text
Next ColumnNode
XMLDICT.Add Counter, NodeValues
Counter = Counter + 1
Next RowNode
End Sub

最佳答案

您可以尝试使用 SAX 而不是 DOM。当您所做的只是解析文档并且文档的大小非常大时,SAX 应该会更快。 MSXML 中 SAX2 实现的引用是 here

对于 Excel 中的大多数 XML 解析,我通常直接使用 DOM,但 SAX 在某些情况下似乎具有优势。空头比较here可能有助于解释它们之间的差异。

这是一个组合在一起的示例(部分基于 this ),仅使用 Debug.Print 进行输出:

通过“工具”>“引用”添加对“Microsoft XML,v6.0”的引用

在普通模块中添加这段代码

Option Explicit

Sub main()

Dim saxReader As SAXXMLReader60
Dim saxhandler As ContentHandlerImpl

Set saxReader = New SAXXMLReader60
Set saxhandler = New ContentHandlerImpl

Set saxReader.contentHandler = saxhandler
saxReader.parseURL "file://C:\Users\foo\Desktop\bar.xml"

Set saxReader = Nothing

End Sub

添加一个类模块,命名为ContentHandlerImpl并添加如下代码

Option Explicit

Implements IVBSAXContentHandler

Private lCounter As Long
Private sNodeValues As String
Private bGetChars As Boolean

使用模块顶部的左侧下拉菜单选择“IVBSAXContentHandler”,然后使用右侧下拉菜单依次为每个事件添加 stub (来自 charactersstartPrefixMapping)

如下在部分 stub 中添加代码

显式设置计数器和标志来显示我们此时是否要读取文本数据

Private Sub IVBSAXContentHandler_startDocument()

lCounter = 0
bGetChars = False

End Sub

每次启动新元素时,检查元素的名称并采取适当的操作

Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String, strLocalName As String, strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes)

Select Case strLocalName
Case "Row"
sNodeValues = ""
Case "Col"
sNodeValues = sNodeValues & "|" & oAttributes.getValueFromName(strNamespaceURI, "id") & ":"
bGetChars = True
Case Else
' do nothing
End Select

End Sub

检查我们是否对文本数据感兴趣,如果我们感兴趣,则切掉任何无关的空白并删除所有换行符(这可能需要也可能不需要,具体取决于您尝试解析的文档)

Private Sub IVBSAXContentHandler_characters(strChars As String)

If (bGetChars) Then
sNodeValues = sNodeValues & Replace(Trim$(strChars), vbLf, "")
End If

End Sub

如果我们已经到达 Col 的末尾,则停止读取文本值;如果我们已经到达 Row 的末尾,则打印出节点值的字符串

Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String, strLocalName As String, strQName As String)

Select Case strLocalName
Case "Col"
bGetChars = False
Case "Row"
lCounter = lCounter + 1
Debug.Print lCounter & " " & sNodeValues
Case Else
' do nothing
End Select

End Sub

为了使事情更清楚,这里是 ContentHandlerImpl 的完整版本,其中包含所有 stub 方法:

Option Explicit

Implements IVBSAXContentHandler

Private lCounter As Long
Private sNodeValues As String
Private bGetChars As Boolean

Private Sub IVBSAXContentHandler_characters(strChars As String)

If (bGetChars) Then
sNodeValues = sNodeValues & Replace(Trim$(strChars), vbLf, "")
End If

End Sub

Private Property Set IVBSAXContentHandler_documentLocator(ByVal RHS As MSXML2.IVBSAXLocator)

End Property

Private Sub IVBSAXContentHandler_endDocument()

End Sub

Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String, strLocalName As String, strQName As String)

Select Case strLocalName
Case "Col"
bGetChars = False
Case "Row"
lCounter = lCounter + 1
Debug.Print lCounter & " " & sNodeValues
Case Else
' do nothing
End Select

End Sub

Private Sub IVBSAXContentHandler_endPrefixMapping(strPrefix As String)

End Sub

Private Sub IVBSAXContentHandler_ignorableWhitespace(strChars As String)

End Sub

Private Sub IVBSAXContentHandler_processingInstruction(strTarget As String, strData As String)

End Sub

Private Sub IVBSAXContentHandler_skippedEntity(strName As String)

End Sub

Private Sub IVBSAXContentHandler_startDocument()

lCounter = 0
bGetChars = False

End Sub

Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String, strLocalName As String, strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes)

Select Case strLocalName
Case "Row"
sNodeValues = ""
Case "Col"
sNodeValues = sNodeValues & "|" & oAttributes.getValueFromName(strNamespaceURI, "id") & ":"
bGetChars = True
Case Else
' do nothing
End Select

End Sub

Private Sub IVBSAXContentHandler_startPrefixMapping(strPrefix As String, strURI As String)

End Sub

关于xml - 如何提高 VBA 中 XML 解析的速度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5626653/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com