gpt4 book ai didi

json - 网页抓取的内循环设计

转载 作者:行者123 更新时间:2023-12-04 21:19:39 27 4
gpt4 key购买 nike

我想将餐厅名称、电话号码、网站和地址等餐厅数据导入到 Excel 中,但不幸的是,当我们点击酒店名称时,我得到了赞助结果,也没有获得网站和完整地址,因为它位于内页。我在平台上的一些帮助下创建了一个代码,但它没有帮助。请纠正我的代码中的问题。网址:https://www.yelp.com/searchcflt=restaurants&find_loc=San%20Francisco%2C%20CA&start=

这是我的代码:

Sub GetInfo()
Const URL$ = "https://www.yelp.com/search?cflt=restaurants&find_loc=San%20Francisco%2C%20CA&start="
Dim Http As New XMLHTTP60, Html As New HTMLDocument, Htmldoc As New HTMLDocument, page&, I&

For page = 0 To 1 ' this is where you change the last number for the pages to traverse
With Http
.Open "GET", URL & page * 30, False
.send
Html.body.innerHTML = .responseText
End With

With Html.querySelectorAll("[class*='searchResult']")
For I = 0 To .Length - 1
Htmldoc.body.innerHTML = .Item(I).outerHTML
On Error Resume Next
r = r + 1: Cells(r, 1) = Htmldoc.querySelector("[class*='heading--h3'] > a").innerText
Cells(r, 2) = Htmldoc.querySelector("[class*='container'] > [class*='display--inline-block']").innerText
' Cells(r, 3) = Htmldoc.querySelector("[class*='container'] > address").innerText
'Cells(r, 4) = Htmldoc.querySelector("[class*='container'] > address").NextSibling.innerText
'Inner loop creation
Cells(r, 5) = Htmldoc.querySelector("[class*='container'] > website").href ' Extract from window after clicking on hotel name
Cells(r, 6) = Htmldoc.querySelector("[class*='container'] > fulladdress").innerText ' Extract from window after clicking on hotel name
On Error GoTo 0
Next I
End With
Next page
End Sub

最佳答案

您可以使用免费的 API 从 business_search 中获取前 50 名端点。在查询字符串中传递排序参数以获得最高评价。
使用 json 解析器,例如 jsonconverter.bas处理响应。在名为 JsonConverter 的标准模块中安装该链接中的代码后,转到 VBE > 工具 > 引用 > 添加对 Microsoft Scripting Runtime 的引用。
API 指令为 here .您需要设置 test app ,这需要一些基本的用户信息,并验证您的电子邮件。然后您将收到 authentication 的 API key 它在授权 header 中传递,如下所示。
如果需要,您可以解析返回的其他信息。

Option Explicit

Public Sub GetTopRestuarants()
Dim json As Object, headers(), r As Long, c As Long
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://api.yelp.com/v3/businesses/search?term=restuarant&location=san-francisco&limit=50&sort_by=rating", False
.setRequestHeader "Authorization", "Bearer yourAPIkey"
.send
Set json = JsonConverter.ParseJson(.responseText)("businesses")
headers = Array("Restaurant name", "phone", "website", "address")
Dim results(), item As Object
ReDim results(1 To json.Count, 1 To UBound(headers) + 1)
For Each item In json
r = r + 1
results(r, 1) = item("name")
results(r, 2) = item("phone")
results(r, 3) = item("url")
Dim subItem As Variant, address As String
address = vbNullString
For Each subItem In item("location")("display_address")
address = address & Chr$(32) & subItem
Next
results(r, 4) = Trim$(address)
Next
End With
With ActiveSheet
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
End Sub
返回 50 个中的前 20 个示例:
enter image description here

警告购买者
请注意,指定 sort_by 是对 Yelp 搜索的建议(并非严格强制执行),它会考虑多个输入参数以返回最相关的结果。例如,评分排序并不是严格按照评分值排序,而是按照考虑到评分数的调整后评分值排序,类似于贝叶斯平均。这是为了防止通过单一评论向企业倾斜结果。

关于json - 网页抓取的内循环设计,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56717313/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com