gpt4 book ai didi

vba - 使用 VBA 进行网页抓取

转载 作者:行者123 更新时间:2023-12-04 20:54:11 26 4
gpt4 key购买 nike

我正在尝试使用 VBA 自动化网络爬虫来收集某些商品的价格数据。我对 VBA 很陌生,并且一直在尝试使用类似主题的答案来构建我的代码,但由于“类型不匹配”而被卡住了。我有这个可以打开 IE 工作正常:

    Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")

With appIE
.Navigate "https://grocery.walmart.com/"
.Visible = True
End With

Do While appIE.Busy
DoEvents
Loop

然而,我现在希望找到价格,即高露洁的 1.67 美元和
在下面的代码中,Nature Valley 2.78 美元:

<span data-automation-id="items">
<div class="CartItem__itemContainer___3vA-E" tabindex="-1" data-automation-id="cartItem">
<div class="CartItem__itemInfo___3rgQd">
<span class="TileImage__tileImage___35CNo">
<div class="TileImage__imageContainer___tlQZb">
<img alt="1 of C, o" src="https://i5.walmartimages.com/asr/36829cef-43f2-4d21-9d5e-10aa9def01dd_7.04089903cc0038b3dac3c204ef7e417e.png?odnHeight=150&amp;odnWidth=150&amp;odnBg=ffffff" class="TileImage__image___3MrIo" data-automation-id="image" aria-hidden="true">
</div><span data-automation-id="quantity" class="TileImage__quantity___1rgG4 hidden__audiblyHidden___RoAkK" role="button" aria-label="1 of C, select to change quantities">
1</span></span><div class="CartItem__name___2RJs5">
<div data-automation-id="name" tabindex="0" role="button" aria-label="C button, Select to change quantities">
Colgate Cavity Protection Fluoride Toothpaste - 6 oz</div><span data-automation-id="list-price" class="ListPrice__listPrice___1x8TM" aria-label="1 dollar and 67 cents each">
$1.67 each</span><a class="CartItem__detailsLink___2ts9b" aria-label="Colgate Cavity Protection Fluoride Toothpaste - 6 oz" tabindex="0" href="/ip/Colgate-Cavity-Protection-Fluoride-Toothpaste---6-oz/49714957">
View details</a></div><span class="Price__groceryPriceContainer___19Jim CartItem__price___2ADX6" data-automation-id="price" aria-label="1 dollar and 67 cents ">
<sup class="Price__currencySymbol___3Ye7d">
$</sup><span class="Price__wholeUnits___lFhG5" data-automation-id="wholeUnits">
1</span><sup class="Price__partialUnits___1VX5w" data-automation-id="partialUnits">
67</sup></span></div><div></div></div><div class="CartItem__itemContainer___3vA-E" tabindex="-1" data-automation-id="cartItem">
<div class="CartItem__itemInfo___3rgQd">
<span class="TileImage__tileImage___35CNo">
<div class="TileImage__imageContainer___tlQZb">
<img alt="1 of N, a" src="https://i5.walmartimages.com/asr/775482d5-a136-4ca3-9353-28646ec999c3_1.d861ce7abd9797cbafec2cd2a4b24874.jpeg?odnHeight=150&amp;odnWidth=150&amp;odnBg=ffffff" class="TileImage__image___3MrIo" data-automation-id="image" aria-hidden="true">
</div><span data-automation-id="quantity" class="TileImage__quantity___1rgG4 hidden__audiblyHidden___RoAkK" role="button" aria-label="1 of N, select to change quantities">
1</span></span><div class="CartItem__name___2RJs5">
<div data-automation-id="name" tabindex="0" role="button" aria-label="N button, Select to change quantities">
Nature Valley Granola Bars Sweet and Salty Nut Cashew 6 Bars - 1.2 oz</div><span data-automation-id="list-price" class="ListPrice__listPrice___1x8TM" aria-label="2 dollars and 78 cents each">
$2.78 each</span><a class="CartItem__detailsLink___2ts9b" aria-label="Nature Valley Granola Bars Sweet and Salty Nut Cashew 6 Bars - 1.2 oz" tabindex="0" href="/ip/Nature-Valley-Granola-Bars-Sweet-and-Salty-Nut-Cashew-6-Bars---1.2-oz/10311347">
View details</a></div><span class="Price__groceryPriceContainer___19Jim CartItem__price___2ADX6" data-automation-id="price" aria-label="2 dollars and 78 cents ">
<sup class="Price__currencySymbol___3Ye7d">
$</sup><span class="Price__wholeUnits___lFhG5" data-automation-id="wholeUnits">
2</span><sup class="Price__partialUnits___1VX5w" data-automation-id="partialUnits">
78</sup></span></div><div></div></div>


我的直觉(作为一个真正的初学者)是找到上面的 div 类部分,然后搜索 aria-label 并复制它后面的文本,但我觉得它会非常冗长,最终可能会出现大量错误如果该 div 类术语在页面的其他地方重复。

关于我应该如何进行(以及这是否是个好主意)的任何帮助都会非常有帮助。谢谢!

最佳答案

可以使用针对 class 属性的 CSS 选择器选择所有价格:

[class='Price__groceryPriceContainer___19Jim CartItem__price___2ADX6']

您可以通过 querySelectorAll 应用 CSS 选择器。 document的方法这将返回 nodeList .

您也可以使用以下方式获取集合:
.document.getElementsByClassName("Price__groceryPriceContainer___19Jim CartItem__price___2ADX6")

代码大纲:
Option Explicit
Public Sub TEST()
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")

With appIE
.navigate "https://grocery.walmart.com/" '> Travel to homepage
.Visible = True '< Show browser window

Do While .Busy = True Or .readyState <> 4: DoEvents: Loop '< Wait for page to have loaded

Dim priceList As Object, namesList As Object, i As Long, ws As Worksheet, lastRow As Long
Set ws = ActiveSheet
'Code to get your basket ready
lastRow = GetLastRow(ws, 1)

Set priceList = .document.querySelectorAll("[class='Price__groceryPriceContainer___19Jim CartItem__price___2ADX6']") 'Select elements by their class attribute (match on basket item prices)
Set nameList = .document.querySelectorAll("[ data-automation-id='name']")

For i = 0 To priceList.Length - 1 '< Loop the nodeList of matched elements
With ws
.Cells(lastRow + i + 1, 1) = nameList.item(i).innerText '<access the name of each matched element
.Cells(lastRow + i + 1, 2) = Now
.Cells(lastRow + i + 1, 3) = priceList.item(i).innerText '<access the price of each matched element
End With
Next i
End With
End Sub

Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
With ws
GetLastRow = .Cells(.Rows.count, columnNumber).End(xlUp).Row
End With
End Function

固定篮子元素:

牙膏:

如果购物车中的商品保持固定并且购物篮中的价格随着时间的推移而更新,您可以跟踪牙膏价格的变化,例如,如果您使用以下 CSS 选择器:
.CartItem__name___2RJs5 + span

所以:
Debug.Print .document.querySelector(".CartItem__name___2RJs5 + span").innerText

或者:
Debug.Print .document.querySelectorAll("[class='Price__groceryPriceContainer___19Jim CartItem__price___2ADX6']").item(0).innerText

最后一个使用 class 属性返回所有匹配元素(您的一篮子元素)的 nodeList 并通过索引 0 访问第一项(牙膏):

index

或者您可以使用 .querySelector方法将返回第一个匹配,即索引 0:
Debug.Print .document.querySelector("[class='Price__groceryPriceContainer___19Jim CartItem__price___2ADX6']").innerText

我的代码通过使用 CSS 选择器(页面样式)来匹配元素的类属性来定位元素。您所有的购物篮商品价格都具有类属性 Price__groceryPriceContainer___19Jim CartItem__price___2ADX6 .因此,我的代码将具有此类属性的元素的 nodeList(有点像数组)拉回。遍历nodeList的长度,按索引(从0开始)访问每个元素。 .innerText属性返回元素的文字字符串值,即价格。

关于vba - 使用 VBA 进行网页抓取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51819608/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com