gpt4 book ai didi

vba - 无法摆脱脚本中的硬编码延迟

转载 作者:行者123 更新时间:2023-12-04 14:02:43 26 4
gpt4 key购买 nike

我在 vba 中结合 selenium 编写了一个脚本来解析网页中所有可用的公司名称。该网页已启用延迟加载方法,因此每个滚动中只有 20 个链接可见。如果我滚动 2 次,则可见链接数为 40,依此类推。该网页中有 1000 个链接可用。我下面的脚本可以到达该页面的底部,处理所有滚动并获取该网页中可用的所有名称。

但是,每次滚动后,该网页需要等待一段时间才能更新内容。这是我用过的地方hardcoded delay但是硬编码事物的过程非常不一致,有时它会使浏览器在整个操作完成之前退出。

如何修改这部分 .Wait 6000让它 Explicit Wait而不是 Hardcoded Wait .

这是我到目前为止所写的:

Sub Getlinks()
Dim driver As New ChromeDriver, prevlen&, curlen&
Dim posts As Object, post As Object

With driver
.get "http://fortune.com/fortune500/list/"
prevlen = .FindElementsByClass("company-title").Count

Do
prevlen = curlen
.ExecuteScript ("window.scrollTo(0, document.body.scrollHeight);")

.Wait 6000 ''I like to kick out this hardcoded delay and use explicit wait in place

Set posts = .FindElementsByClass("company-title")
curlen = posts.Count
If prevlen = curlen Then Exit Do
Loop

For Each post In posts
R = R + 1: Cells(R, 1) = post.Text
Next post
End With
End Sub

最佳答案

这是一种完全不同的方法,它不需要使用浏览器,而是提交一系列 Web 请求。使用这种方法,无需担心等待页面加载。

通常,对于延迟加载页面,它会在您滚动时提交一个新请求来加载页面的数据。如果您监控网络流量,您可以发现发出的请求并模拟这些请求,我在下面做了。

结果应该是公司名称列表,无论 Excel 的第一张是什么,都按升序排列。

你需要的东西:

添加对以下内容的引用:

  • Microsoft 脚本运行时
  • 微软 XML v6.0
  • 将 VBA-JSON 代码添加到您的项目中。您可以找到 here

  • 编辑

    更改了代码以继续从站点中提取数据,直到列表中没有更多项目。感谢@Qharr 指出这一点。

    代码
    Public Sub SubmitRequest()
    Const baseURL As String = "http://fortune.com/api/v2/list/2358051/expand/item/ranking/asc/"

    Dim Url As String
    Dim startingNumber As Long
    Dim j As Long
    Dim getRequest As MSXML2.XMLHTTP60
    Dim Json As Object
    Dim Companies As Object
    Dim Company As Variant
    Dim CompanyArray As Variant

    'Create an array to hold each company
    ReDim CompanyArray(0 To 50000)
    'Create a new XMLHTTP object so we can place a get request
    Set getRequest = New MSXML2.XMLHTTP60

    'The api seems to only support returning 100 records at a time
    'So do in batches of 100
    Do
    'Build the url, the format is something like
    '0/100, where 0 is the starting position, and 100 is the ending position
    Url = baseURL & startingNumber & "/" & startingNumber + 100

    With getRequest
    .Open "GET", Url
    .send

    'The response is a JSON object, for this code to work -
    'You'll need this code https://github.com/VBA-tools/VBA-JSON
    'What is returned is a dictionary
    Set Json = JsonConverter.ParseJson(.responseText)
    Set Companies = Json("list-items")

    'Keep checking in batches of 100 until there are no more
    If Companies.Count = 0 Then Exit Do

    'Iterate the dictionary and return the title (which is the name)
    For Each Company In Companies
    CompanyArray(j) = Company("title")
    j = j + 1
    Next

    End With
    startingNumber = startingNumber + 100
    Loop

    ReDim Preserve CompanyArray(j - 1)

    'Dump the data to the first sheet
    ThisWorkbook.Sheets(1).Range("A1:A" & j) = WorksheetFunction.Transpose(CompanyArray)

    End Sub

    关于vba - 无法摆脱脚本中的硬编码延迟,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50419287/

    26 4 0
    文章推荐: php - 为什么 setcookie 在服务器上不起作用?
    文章推荐: python - 为什么 Dart 和 Python 之间的 base85 不匹配?
    文章推荐: javascript -