gpt4 book ai didi

excel - 当一切完成后,Scraper 会抛出错误而不是退出浏览器

转载 作者:行者123 更新时间:2023-12-01 18:38:05 26 4
gpt4 key购买 nike

我编写了一个抓取工具来解析来自 torrent 站点的电影信息。我使用了 IEqueryselector

我的代码确实解析了所有内容。当一切完成后,它会抛出错误而不是退出浏览器。如果我取消错误框,我就可以看到结果。

完整代码如下:

Sub Torrent_Data()
Dim IE As New InternetExplorer, html As HTMLDocument
Dim post As Object

With IE
.Visible = False
.navigate "https://yts.am/browse-movies"
Do While .readyState <> READYSTATE_COMPLETE: Loop
Set html = .Document
End With

For Each post In html.querySelectorAll(".browse-movie-bottom")
Row = Row + 1: Cells(Row, 1) = post.queryselector(".browse-movie-title").innerText
Cells(Row, 2) = post.queryselector(".browse-movie-year").innerText
Next post
IE.Quit
End Sub

我上传了两张图片来显示错误。

First error

Second error

两个错误同时出现。

我使用的是 Internet Explorer 11。

如果我像下面这样尝试,它会成功地带来结果,没有任何问题。

Sub Torrent_Data()
Dim IE As New InternetExplorer, html As HTMLDocument
Dim post As Object

With IE
.Visible = False
.navigate "https://yts.am/browse-movies"
Do While .readyState <> READYSTATE_COMPLETE: Loop
Set html = .Document
End With

For Each post In html.getElementsByClassName("browse-movie-bottom")
Row = Row + 1: Cells(Row, 1) = post.queryselector(".browse-movie-title").innerText
Cells(Row, 2) = post.queryselector(".browse-movie-year").innerText
Next post
IE.Quit
End Sub

添加到库中的引用:

  1. Microsoft Internet Controls
  2. Microsoft HTML Object Library

是否有任何引用可以添加到库中以消除错误?

最佳答案

好的,所以该网页有一些非常不友好的地方。它对我来说一直崩溃。因此,我求助于在脚本引擎/脚本控件中运行 JavaScript 程序,并且它有效。

希望大家能够关注。逻辑位于添加到 ScriptEngine 的 javascript 中。我得到两个节点列表,一个电影列表和一个年份列表;然后我同步遍历每个数组并将它们作为键值对添加到 Microsoft 脚本字典中。

Option Explicit

'*Tools->References
'* Microsoft Scripting Runtime
'* Microsoft Scripting Control
'* Microsoft Internet Controls
'* Microsoft HTML Object Library

Sub Torrent_Data()
Dim row As Long
Dim IE As New InternetExplorer, html As HTMLDocument
Dim post As Object

With IE
.Visible = True
.navigate "https://yts.am/browse-movies"
Do While .readyState <> READYSTATE_COMPLETE:
DoEvents
Loop
Set html = .document
End With

Dim dicFilms As Scripting.Dictionary
Set dicFilms = New Scripting.Dictionary

Call GetScriptEngine.Run("getMovies", html, dicFilms)

Dim vFilms As Variant
vFilms = dicFilms.Keys

Dim vYears As Variant
vYears = dicFilms.Items

Dim lRowLoop As Long
For lRowLoop = 0 To dicFilms.Count - 1

Cells(lRowLoop + 1, 1) = vFilms(lRowLoop)
Cells(lRowLoop + 1, 2) = vYears(lRowLoop)

Next lRowLoop

Stop

IE.Quit
End Sub

Private Function GetScriptEngine() As ScriptControl
'* see code from this SO Q & A
' https://stackoverflow.com/questions/37711073/in-excel-vba-on-windows-how-to-get-stringified-json-respresentation-instead-of
Static soScriptEngine As ScriptControl
If soScriptEngine Is Nothing Then
Set soScriptEngine = New ScriptControl
soScriptEngine.Language = "JScript"

soScriptEngine.AddCode "function getMovies(htmlDocument, microsoftDict) { " & _
"var titles = htmlDocument.querySelectorAll('a.browse-movie-title'), i;" & _
"var years = htmlDocument.querySelectorAll('div.browse-movie-year'), j;" & _
"if ( years.length === years.length) {" & _
"for (i=0; i< years.length; ++i) {" & _
" var film = titles[i].innerText;" & _
" var year = years[i].innerText;" & _
" microsoftDict.Add(film, year);" & _
"}}}"

End If
Set GetScriptEngine = soScriptEngine
End Function

关于excel - 当一切完成后,Scraper 会抛出错误而不是退出浏览器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47993064/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com