gpt4 book ai didi

excel - VBA - 网页抓取无法获取 HTMLElement insideText

转载 作者:行者123 更新时间:2023-12-03 03:18:52 24 4
gpt4 key购买 nike

我正在尝试使用 Excel VBA 取消汇率,但无法获得我需要的innerText 值。我不明白为什么,因为同样的技术也适用于其他网站。

网址 - https://www.nbs.rs/export/sites/default/internet/english/scripts/kl_srednji.html

Sub GetCurr()

Dim tempHTMLDoc As New MSHTML.HTMLDocument
Dim HTMLCurrency As MSHTML.IHTMLElementCollection
Dim HTMLRows As MSHTML.IHTMLElementCollection
Dim HTMLDate As MSHTML.IHTMLElementCollection
Dim HTMLElem As MSHTML.IHTMLElement
Dim connectionTest As Boolean
Dim EUR, CZK, HRK, HUF, PLN, RON, RSD As String
Dim myURL As String
Dim i As Long

connectionTest = True
myURL = "https://www.nbs.rs/export/sites/default/internet/english/scripts/kl_srednji.html"

Call WebConnection(tempHTMLDoc, connectionTest, myURL)
If connectionTest = False Then Exit Sub

Set HTMLDate = tempHTMLDoc.getElementsByTagName("span")
'Debug.Print HTMLDate.Length

For Each HTMLElem In HTMLDate 'I am looking for which element contains the date (can not find)
Debug.Print HTMLElem.innerText
Next HTMLElem

'I am trying to get the necessary currencies
Set HTMLRows = tempHTMLDoc.getElementsByTagName("tr")

Debug.Print HTMLRows.Length

For i = 0 To HTMLRows.Length - 1 'If lenght > 0

Set HTMLCurrency = HTMLRows(i).getElementsByTagName("td")

If HTMLCurrency.Length > 4 Then 'each currency contains 5 "td" tags

Select Case HTMLCurrency(2).innerText
Case "EUR"
EUR = HTMLCurrency(4).innerText
Case "HRK"
HRK = HTMLCurrency(4).innerText
Case "HUF"
HUF = HTMLCurrency(4).innerText
Case "PLN"
PLN = HTMLCurrency(4).innerText
Case "RON"
RON = HTMLCurrency(4).innerText
Case "CZK"
CZK = HTMLCurrency(4).innerText
End Select

End If

Next i

Debug.Print "EUR - ", EUR; vbNewLine; "HRK - ", HRK; vbNewLine; "HUF - ", HUF; vbNewLine; "PLN - ", PLN; vbNewLine; _
"RON - ", RON; vbNewLine; "CZK - ", CZK

End Sub

'============================================================================

Sub WebConnection(HTMLDoc As MSHTML.HTMLDocument, ConnTest As Boolean, URL As String)

Dim XMLPage As New MSXML2.XMLHTTP60
Dim errorMsg As VbMsgBoxResult

On Error GoTo CONNECTION_ERROR

XMLPage.Open "GET", URL, False
XMLPage.send

On Error GoTo 0

If XMLPage.Status <> 200 Then
errorMsg = MsgBox("There is something wrong with webpage. Do you want to try to continue?", vbYesNo + vbCritical, "ERROR")
If errorMsg = vbNo Then
ConnTest = False
Exit Sub
End If
End If

HTMLDoc.body.innerHTML = XMLPage.responseText
Exit Sub

CONNECTION_ERROR:
MsgBox "There is something wrong with the connection.", vbCritical, "ERROR"
ConnTest = False
Exit Sub

End Sub

我尝试使用 id (index:srednjiKursList:tbody_element) 或类名(tableCell) 但它不起作用。该网站以不同的方式构建

最佳答案

您的原始链接(我们称之为登录页面)是动态加载的。您的 GET 请求太快,无法检索所需的信息。

您可以使用另一个 URL。

当您转到登陆页面时,您会发现它实际上发出了 XMLHTTP GET请求到以下页面:

get request

以上内容来自使用 fiddler但您可以使用 Chrome 开发工具 (F12) 等工具检查网络流量。

您可以将该 URL 直接输入到代码中,并且效果完美。

<小时/>

整个表格:

您还可以按如下方式获取整个表格:

Option Explicit
Public Sub GetInfo()
Dim html As New HTMLDocument, hTable As HTMLTable, clipboard As Object
Const URL = "https://www.nbs.rs/kursnaListaModul/srednjiKurs.faces?lang=eng"

With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.send
html.body.innerHTML = StrConv(.responseBody, vbUnicode)
End With

Set hTable = html.getElementById("index:srednjiKursLista")
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
clipboard.SetText hTable.outerHTML
clipboard.PutInClipboard
ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).PasteSpecial
End Sub
<小时/>

结果示例:

results

<小时/><小时/>

仅列出的货币:

您还可以根据表结构使用一些数学知识来获取您列出的那些元素。

Option Explicit
Public Sub GetInfo()
Dim html As New HTMLDocument, hTable As HTMLTable, clipboard As Object
Const URL = "https://www.nbs.rs/kursnaListaModul/srednjiKurs.faces?lang=eng"

With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.send
html.body.innerHTML = StrConv(.responseBody, vbUnicode)
End With

Set hTable = html.getElementById("index:srednjiKursLista")

Dim list As Object, i As Long
Dim EUR As Double, CZK As Double, HRK As Double, HUF As Double, PLN As Double, RON As Double, RSD As Double
Set list = hTable.querySelectorAll("td")
For i = 2 To list.Length - 1 Step 5
Select Case list.item(i).innerText
Case "EUR"
EUR = list.item(i + 2).innerText
Case "HRK"
HRK = list.item(i + 2).innerText
Case "HUF"
HUF = list.item(i + 2).innerText
Case "PLN"
PLN = list.item(i + 2).innerText
Case "RON"
RON = list.item(i + 2).innerText
Case "CZK"
CZK = list.item(i + 2).innerText
End Select
Next

Debug.Print "EUR - ", EUR; vbNewLine; "HRK - ", HRK; vbNewLine; "HUF - ", HUF; vbNewLine; "PLN - ", PLN; vbNewLine; _
"RON - ", RON; vbNewLine; "CZK - ", CZK
End Sub
<小时/>

使用剪贴板:

以下行:

GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")

添加对 Microsoft Forms 对象库的后期绑定(bind)引用,以便您可以访问剪贴板。

您还可以将用户窗体添加到您的项目中,或者转至 VBE > 工具 > 引用 > Microsoft Forms 对象库来访问:

Forms

关于excel - VBA - 网页抓取无法获取 HTMLElement insideText,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52028019/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com