gpt4 book ai didi

html - 通过 Span 标签进行网页抓取

转载 作者:行者123 更新时间:2023-12-04 22:28:12 25 4
gpt4 key购买 nike

我正在尝试从下面提到的网站复制数据,网页上提到的所有大小和成本范围我需要所有数据。我在下面的代码框架,但我只能复制三个元素。有人可以调查一下吗?

网址- https://www.leetstorage.com/sizes-and-pricing

Sub TagClassName()

Dim ie As New InternetExplorer, ws As Worksheet

Set ws = ThisWorkbook.Worksheets("Unit Data")
With ie
.Visible = True
.Navigate2 "https://www.leetstorage.com/sizes-and-pricing"

While .Busy Or .readyState < 4: DoEvents: Wend

Dim listings As Object, listing As Object, headers(), results(), r As Long, c As Long, item As Object
headers = Array("size")
Set listings = .document.getElementById("site_content").getElementsByTagName("ul")

ReDim results(1 To listings.Length, 1 To UBound(headers) + 1)
For Each listing In listings

r = r + 1
On Error Resume Next
results(r, 1) = listing.getElementsByClassName("font-size-NaN m-font-size-NaN")(0).innerText

On Error GoTo 0
Next
Next
ws.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
ws.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
.Quit

End With


End Sub

最佳答案

您可以使用以下内容。你想要 child li父元素 ul (无序列表)类 innerList 的元素

Internet Explorer:

Option Explicit
'VBE > Tools > References:
' Microsoft Internet Controls
Public Sub RetrieveInfo()
Dim IE As InternetExplorer, i As Long, items As Object
Set IE = New InternetExplorer

With IE
.Visible = True
.Navigate2 "https://www.leetstorage.com/sizes-and-pricing"

While .Busy Or .readyState < 4: DoEvents: Wend

Set items = .document.querySelectorAll(".innerList li")

For i = 0 To items.Length - 1
With ThisWorkbook.Worksheets("Sheet1")
.Cells(i + 1, 1) = Trim$(items.item(i).innerText)
End With
Next
End With
End Sub

XHR:

只要您提供 User-Agent header ,您就可以使用 XHR 更快地完成此操作
Option Explicit
Public Sub GetInfo()
Dim html As HTMLDocument, items As Object, i As Long '< VBE > Tools > References > Microsoft HTML Object Library
Set html = New HTMLDocument

With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.leetstorage.com/sizes-and-pricing", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
End With
Set items = html.querySelectorAll(".innerList li")

For i = 0 To items.Length - 1
With ThisWorkbook.Worksheets("Sheet1")
.Cells(i + 1, 1) = Trim$(items.item(i).innerText)
End With
Next
End Sub

ul block :

如果您查看 ul 的类名返回的内容然后您会在页面上获得包含列表的 3 个 block :

enter image description here

ul 与 li:

仅取其中一个 block 来举例说明添加子的效果 li descendant combinator 的元素:

关于html - 通过 Span 标签进行网页抓取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55728149/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com