gpt4 book ai didi

javascript - VBA - 处理 XMLHTTP GET 请求中的 JavaScript 内容

转载 作者:行者123 更新时间:2023-11-28 05:22:51 24 4
gpt4 key购买 nike

我想从网页中提取内容。但是,当我收到响应文本时,它包含 JavaScript,无法像浏览器打开的页面一样进行处理。

此方法可用于获取 HTML 内容还是只有浏览器模拟可以提供帮助?或者也许有一些不同的方法来接收此内容?

Dim oXMLHTTP As New MSXML2.XMLHTTP
Dim htmlObj As New HTMLDocument

With oXMLHTTP
.Open "GET", "http://www.manta.com/ic/mtqyfk0/ca/riverbend-holdings-inc", False
.send

If .ReadyState = 4 And .Status = 200 Then
htmlObj.body.innerHTML = .responseText
'do things
End If

End With

响应文本:

<!DOCTYPE html>
<head>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<meta http-equiv="cache-control" content="max-age=0" />
<meta http-equiv="cache-control" content="no-cache" />
<meta http-equiv="expires" content="0" />
<meta http-equiv="expires" content="Tue, 01 Jan 1980 1:00:00 GMT" />
<meta http-equiv="pragma" content="no-cache" />
<meta http-equiv="refresh" content="10; url=/distil_r_blocked.html?Ref=/ic/mtq599v/ca/45th-street-limited-partnership&amp;distil_RID=2115B138-A1BF-11E6-A957-C0595F6B962F&amp;distil_TID=20161103121454" />
<script type="text/javascript">
(function(window){
try {
if (typeof sessionStorage !== 'undefined'){
sessionStorage.setItem('distil_referrer', document.referrer);
}
} catch (e){}
})(window);
</script>
<script type="text/javascript" src="/ser-yrbwqfedrrwwvctvyavy.js" defer></script><style type="text/css">#d__fFH{position:absolute;top:-5000px;left:-5000px}#d__fF{font-family:serif;font-size:200px;visibility:hidden}#verxvaxcuczwcwecuxsx{display:none!important}</style></head>
<body>
<div id="distil_ident_block">&nbsp;</div>
</body>
</html>

最佳答案

否 - 因为 Javascript 实际上是 <script> 内 HTML 的一部分标签。您必须对响应进行后处理才能自行删除标签。

您可以使用函数来删除 <script>收到页面的 HTML 后,从 DOM 中获取节点:

Function RemoveScriptTags(objHTML As HTMLDocument) As String

Dim objElement As HTMLGenericElement

For Each objElement In objHTML.all
If VBA.LCase$(objElement.nodeName) = "script" Then
objElement.removeNode
End If
Next objElement

RemoveScriptTags = objHTML.DocumentElement.outerHTML

End Function

这可以包含在您的示例代码中,如下所示:

Option Explicit

Sub Test()

Dim objXMLHTTP As New MSXML2.XMLHTTP
Dim objHTML As Object
Dim strUrl As String
Dim strHtmlNoScriptTags As String

strUrl = "http://www.manta.com/ic/mtqyfk0/ca/riverbend-holdings-inc"

With objXMLHTTP
.Open "GET", strUrl, False
.send

If .ReadyState = 4 And .Status = 200 Then
Set objHTML = CreateObject("htmlfile")
objHTML.Open
objHTML.write objXMLHTTP.responseText
objHTML.Close

'do things
strHtmlNoScriptTags = RemoveScriptTags(objHTML)
Debug.Print strHtmlNoScriptTags

'update html document with script-less document
Set objHTML = CreateObject("htmlfile")
objHTML.Open
objHTML.write strHtmlNoScriptTags
objHTML.Close

'you can now operate on DOM of objHTML

End If

End With

End Sub

关于javascript - VBA - 处理 XMLHTTP GET 请求中的 JavaScript 内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40409746/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com