gpt4 book ai didi

excel - VBA - XMLHTTP 和 WinHttp 请求速度

转载 作者:行者123 更新时间:2023-12-02 22:51:08 26 4
gpt4 key购买 nike

下面是我在宏中实现的 3 个请求的声明变量。我在评论中列出了他们使用的库及其后期绑定(bind):

Dim XMLHTTP As New MSXML2.XMLHTTP 'Microsoft XML, v6.0 'Set XMLHTTP = CreateObject("MSXML2.XMLHTTP.6.0")
Dim ServerXMLHTTP As New MSXML2.ServerXMLHTTP 'Microsoft XML, v6.0 'Set ServerXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP.6.0")
Dim http As New WinHttpRequest 'Microsoft WinHttp Services, version 5.1 'Set http = CreateObject("WinHttp.WinHttpRequest.5.1")

我有一些旧的网页抓取宏,它们使用 Internet Explorer 自动化。我想清理编码并加快这些请求的速度。

不幸的是,我注意到,MSXML2.ServerXMLHTTPWinHttpRequest 在在线商店的 20 个产品测试(34 和 35 秒)上比带有图片和事件脚本的 IE 自动化要慢关闭(24 秒)! MSXML2.XMLHTTP 在 18 秒内执行。我曾经见过这 3 个请求中的某些请求比其他请求快/慢 2-3 倍的情况,因此我总是测试哪一个执行得最好,但以前从未有任何请求因 IE 自动化而丢失。

包含结果的主页如下,所有结果都在一页上,其中有 1500 多个结果,因此请求需要一些时间(如果粘贴到 MS Word 则需要 6500 页):

www.justbats.com/products/bat type~baseball/?sortBy=TotalSales Descending&page=1&size=2400

然后我从主结果页面打开各个链接:

http://www.justbats.com/product/2017-marucci-cat-7-bbcor-baseball-bat--mcbc7/24317/

我想知道这 3 个请求是否都是我必须从没有浏览器自动化的网站获取数据的选项。另外 - 浏览器自动化怎么可能击败其中一些请求?

更新

我已经使用 Robin Mackenzie 回答中提供的程序测试了主结果页面,在运行之前清除了 IE 缓存。至少在这个特定页面上,缓存似乎没有明显的 yield ,因为后续请求产生了类似的结果。 IE 禁用了事件脚本并且没有加载图像。

IE automation method, Document length: 7593346 chars, Processed in: 8 seconds

WinHTTP method, Document length: 7824059 chars, Processed in: 29 seconds

XML HTTP method, Document length: 7830217 chars, Processed in: 4 seconds

Server XML HTTP method, Document length: 7823958 chars, Processed in: 26 seconds

URL download file method, Document length: 7830346 chars, Processed in: 7 seconds

令我非常惊讶的是这些方法返回的字符数量的差异。

最佳答案

除了您提到的方法之外:

  • IE 自动化
  • WinHTTPRequest
  • XMLHTTP
  • 服务器XMLHTTP

您可以考虑其他两种方法:

  • 使用 MSHTML.HTMLDocument 对象的 CreateDocumentFromUrl 方法
  • 使用 Windows API 函数URLDownloadToFileA

还有一些我忽略的其他 Windows API,例如 InternetOpenInternetOpenUrl 等,因为猜测响应长度、缓冲响应的复杂性将超过潜在的性能,等等。

从Url创建文档

使用 CreateDocumentFromUrl 方法时,您的示例网站会出现问题,因为它尝试在不允许的框架中创建 HTMLDocument,并出现以下错误:

Framing Forbidden

To help protect the security of information you enter into this website, the publisher of this content does not allow it to be displayed in a frame.

所以我们不应该使用这种方法。

URL下载到文件A

我以为你需要 相当于 file_get_contents 并找到了这个方法。它很容易使用(检查 this link ),并且在处理大量请求时优于其他方法(例如,当您使用 >2000 个棒球棒时尝试它)。 XMLHTTP 方法还使用 URLMon 库,所以我想这种方式只是删除了一些中间人逻辑,显然有一个缺点,因为你必须做一些文件系统处理。

Private Declare Function URLDownloadToFileA Lib "urlmon" (ByVal pCaller As Long, _
ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, _
ByVal lpfnCB As Long) As Long

Sub TestUrlDownloadFile(strUrl As String)

Dim dteStart As Date
Dim dteFinish As Date
Dim strTempFileName As String
Dim strResponse As String
Dim objFso As FileSystemObject

On Error GoTo ExitFunction

dteStart = Now
strTempFileName = "D:\foo.txt"
DownloadFile strUrl, strTempFileName
Set objFso = New FileSystemObject
With objFso.OpenTextFile(strTempFileName, ForReading)
strResponse = .ReadAll
.Close
End With
objFso.DeleteFile strTempFileName
dteFinish = Now

Debug.Print "URL download file method"
Debug.Print "Document length: " & Len(strResponse) & " chars"
Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
Debug.Print VBA.vbNewLine

ExitFunction:
If Err.Number <> 0 Then
Debug.Print Err.Description
End If

End Sub

'http://www.vbaexpress.com/forum/archive/index.php/t-27050.html
Private Function DownloadFile(URL As String, LocalFilename As String) As Boolean
Dim lngRetVal As Long
lngRetVal = URLDownloadToFileA(0, URL, LocalFilename, 0, 0)
If lngRetVal = 0 Then DownloadFile = True
End Function

使用 URLDownloadToFileA 时,我需要大约 1-2 秒才能下载示例 URL,而使用 XMLHTTP 方法则需要 4-5 秒(完整代码如下)。

网址:

www.justbats.com/products/bat type~baseball/?sortBy=TotalSales Descending&page=1&size=2400

这是输出:

Testing...


XML HTTP method
Document length: 7869753 chars
Processed in: 4 seconds


URL download file method
Document length: 7869753 chars
Processed in: 1 seconds

代码

这包括讨论的所有方法,例如IE 自动化、WinHTTPRequest、XMLHTTP、ServerXMLHTTP、CreateDocumentFromURL 和 URLDownloadFile。

您需要项目中的所有这些引用:

enter image description here

这里是:

Option Explicit

Private Declare Function URLDownloadToFileA Lib "urlmon" (ByVal pCaller As Long, _
ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, _
ByVal lpfnCB As Long) As Long

Sub Test()

Dim strUrl As String

strUrl = "http://www.justbats.com/products/bat type~baseball/?sortBy=TotalSales Descending&page=1&size=2400"

Debug.Print "Testing..."
Debug.Print VBA.vbNewLine

'TestIE strUrl
'TestWinHHTP strUrl
TestXMLHTTP strUrl
'TestServerXMLHTTP strUrl
'TestCreateDocumentFromUrl strUrl
TestUrlDownloadFile strUrl

End Sub

Sub TestIE(strUrl As String)

Dim dteStart As Date
Dim dteFinish As Date
Dim objIe As InternetExplorer
Dim objHtml As MSHTML.HTMLDocument
Dim strResponse As String

On Error GoTo ExitFunction

dteStart = Now
Set objIe = New SHDocVw.InternetExplorer
With objIe
.navigate strUrl
.Visible = False
While .Busy Or .readyState <> READYSTATE_COMPLETE
DoEvents
Wend
Set objHtml = .document
strResponse = objHtml.DocumentElement.outerHTML
.Quit
End With
dteFinish = Now

Debug.Print "IE automation method"
Debug.Print "Document length: " & Len(strResponse) & " chars"
Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
Debug.Print VBA.vbNewLine

ExitFunction:
If Err.Number <> 0 Then
Debug.Print Err.Description
End If
If Not objIe Is Nothing Then
objIe.Quit
End If
Set objIe = Nothing

End Sub

Sub TestWinHHTP(strUrl As String)

Dim dteStart As Date
Dim dteFinish As Date
Dim objHttp As WinHttp.WinHttpRequest
Dim objDoc As HTMLDocument
Dim strResponse As String

On Error GoTo ExitFunction

dteStart = Now
Set objHttp = New WinHttp.WinHttpRequest
With objHttp
.Open "get", strUrl, False
.setRequestHeader "Cache-Control", "no-cache"
.setRequestHeader "Pragma", "no-cache"
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
.WaitForResponse
strResponse = .responseText
End With
dteFinish = Now

Debug.Print "WinHTTP method"
Debug.Print "Document length: " & Len(strResponse) & " chars"
Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
Debug.Print VBA.vbNewLine

ExitFunction:
If Err.Number <> 0 Then
Debug.Print Err.Description
End If
Set objDoc = Nothing
Set objHttp = Nothing

End Sub

Sub TestXMLHTTP(strUrl As String)

Dim dteStart As Date
Dim dteFinish As Date
Dim objXhr As MSXML2.XMLHTTP60
Dim objDoc As MSHTML.HTMLDocument
Dim strResponse As String

On Error GoTo ExitFunction

dteStart = Now
Set objXhr = New MSXML2.XMLHTTP60
With objXhr
.Open "get", strUrl, False
.setRequestHeader "Cache-Control", "no-cache"
.setRequestHeader "Pragma", "no-cache"
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
While .readyState <> 4
DoEvents
Wend
strResponse = .responseText
End With
dteFinish = Now

Debug.Print "XML HTTP method"
Debug.Print "Document length: " & Len(strResponse) & " chars"
Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
Debug.Print VBA.vbNewLine

ExitFunction:
If Err.Number <> 0 Then
Debug.Print Err.Description
End If
Set objDoc = Nothing
Set objXhr = Nothing

End Sub

Sub TestServerXMLHTTP(strUrl As String)

Dim dteStart As Date
Dim dteFinish As Date
Dim objXhr As MSXML2.ServerXMLHTTP60
Dim objDoc As MSHTML.HTMLDocument
Dim strResponse As String

On Error GoTo ExitFunction

dteStart = Now
Set objXhr = New MSXML2.ServerXMLHTTP60
With objXhr
.Open "get", strUrl, False
.setRequestHeader "Cache-Control", "no-cache"
.setRequestHeader "Pragma", "no-cache"
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
While .readyState <> 4
DoEvents
Wend
strResponse = .responseText
End With
dteFinish = Now

Debug.Print "Server XML HTTP method"
Debug.Print "Document length: " & Len(strResponse) & " chars"
Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
Debug.Print VBA.vbNewLine

ExitFunction:
If Err.Number <> 0 Then
Debug.Print Err.Description
End If
Set objDoc = Nothing
Set objXhr = Nothing

End Sub

Sub TestUrlDownloadFile(strUrl As String)

Dim dteStart As Date
Dim dteFinish As Date
Dim strTempFileName As String
Dim strResponse As String
Dim objFso As FileSystemObject

On Error GoTo ExitFunction

dteStart = Now
strTempFileName = "D:\foo.txt"
If DownloadFile(strUrl, strTempFileName) Then
Set objFso = New FileSystemObject
With objFso.OpenTextFile(strTempFileName, ForReading)
strResponse = .ReadAll
.Close
End With
objFso.DeleteFile strTempFileName
Else
Debug.Print "Error downloading file from URL: " & strUrl
GoTo ExitFunction
End If
dteFinish = Now

Debug.Print "URL download file method"
Debug.Print "Document length: " & Len(strResponse) & " chars"
Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
Debug.Print VBA.vbNewLine

ExitFunction:
If Err.Number <> 0 Then
Debug.Print Err.Description
End If

End Sub

'http://www.vbaexpress.com/forum/archive/index.php/t-27050.html
Private Function DownloadFile(URL As String, LocalFilename As String) As Boolean
Dim lngRetVal As Long
lngRetVal = URLDownloadToFileA(0, URL, LocalFilename, 0, 0)
If lngRetVal = 0 Then
DownloadFile = True
Else
DownloadFile = False
End If
End Function

Sub TestCreateDocumentFromUrl(strUrl As String)

Dim dteStart As Date
Dim dteFinish As Date
Dim strResponse As String
Dim objDoc1 As HTMLDocument
Dim objDoc2 As HTMLDocument

On Error GoTo ExitFunction

dteStart = Now
Set objDoc1 = New HTMLDocument
Set objDoc2 = objDoc1.createDocumentFromUrl(strUrl, "null")
While objDoc2.readyState <> "complete"
DoEvents
Wend
strResponse = objDoc2.DocumentElement.outerHTML
Debug.Print strResponse
dteFinish = Now

Debug.Print "HTML Document Create from URL method"
Debug.Print "Document length: " & Len(strResponse) & " chars"
Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
Debug.Print VBA.vbNewLine

ExitFunction:
If Err.Number <> 0 Then
Debug.Print Err.Description
End If
Set objDoc2 = Nothing
Set objDoc1 = Nothing

End Sub

关于excel - VBA - XMLHTTP 和 WinHttp 请求速度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41523223/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com