javascript - PUPPETEER - 无法使用 page.evaluate(() => document.querySelectorAll()) 提取某些网站上的元素-6ren

javascript - PUPPETEER - 无法使用 page.evaluate(() => document.querySelectorAll()) 提取某些网站上的元素

转载作者：太空宇宙更新时间：2023-11-04 02:54:29

24

4

我正在尝试选择终端中网站的所有链接的 NodeList 和 console.log() 。但是，我无法访问某些网站 - google.com、facebook.com、instagram.com。

我知道元素就在那里，因为我当然可以使用 document.querySelectorAll('a') 将它们记录到实际的 Chromium 控制台中，该控制台单独加载。但是当我尝试在 Node 终端中提取并记录链接时，使用

const links = await page.evaluate(() => document.querySelectorAll('a'))
console.log(links)

我得到未定义

但是，大多数网站的情况并非如此，例如我的代码在其中运行的 yahoo.com、linkedin.com。这是:

const URL = 'https://instagram.com/';
const scrape = async () => {
    const browser = await puppeteer.launch({
        headless: false
    });
    const page = await browser.newPage();
    await page.setViewport({
        width: 1240,
        height: 680
    });
    await page.goto(URL, { waitUntil: 'domcontentloaded' });
    await page.waitFor(6000);
    const links = await page.evaluate(() => document.querySelectorAll('a'));
    console.log(links);
    await page.screenshot({
        path: 'ig.png'
    });
    await browser.close();
};

我尝试添加 bypassBotDetectionSystem() 函数，如 this article 中的建议。，但没有成功。我认为这不是问题，因为就像我说的，我可以轻松地在 Chromium 中导航内容。

感谢您的帮助!

最佳答案

您尝试使用 page.evaluate 方法返回 DOM 元素，但这是不可能的，因为如果传递给 page.evaluate 的函数返回 non-Serializable value，然后 page.evaluate 解析为 undefined 就像您的情况一样。

您可以使用page.$$如果您想获取 ElementHandle 的数组，请改为使用方法。

示例:

const links = await page.$$('a'); // returns <Promise<Array<ElementHandle>>>

但是如果您只想获取属性的所有值(例如 href)，您可以采用 page.$$eval方法，它在页面内运行 Array.from(document.querySelectorAll(selector)) 并将其作为第一个参数传递给 pageFunction

示例:

const hrefs = await page.$$eval('a', links => links.map(link => link.href));
console.log(hrefs);

关于javascript - PUPPETEER - 无法使用 page.evaluate(() => document.querySelectorAll()) 提取某些网站上的元素，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57504201/

24

4

0

文章推荐： python - 将 python 脚本的结果传递给 ExtendScript `.jsx` 文件

文章推荐： python - 如何将 Pandas 中的特定列值转换为列表？

文章推荐： python - 使用 Jinja 遍历项目，在每第 5 个项目后添加 div

文章推荐：将负数从 String 转换为 unsigned Long

javascript - 为什么 document.body.offsetHeight + document.body.bottomMargin 不等于 document.documentElement.offsetHeight
我正在尝试计算 iFrame 的高度，但不明白为什么 document.body.offsetHeight + document.body.bottomMargin 不等于 document.docu
node.js - Mongoose JS : Create a reference in one document to an embedded document in another document
我正在使用 Node/Mongoose/MongoDB 并尝试构建一个轮询应用程序。一个关键需求是跟踪单个用户对同一民意调查的响应如何随时间变化(他们一遍又一遍地进行同一民意调查)。我有一个用户模型
javascript - 如何根据 Microsoft CRM 代码审查在 HTML Webresouce 中使用 javascript document.createElement、document.body、$(document)？
首先，我不是普通的博主，我很困惑。如果我的问题不符合要求，请指导我。我会努力改进的。我已提交 Microsoft Code Review 的 Microsoft CRM 插件。我是 JavaScri
Powershell 'Documents' 或 'My Documents'
谁能解释为什么使用类似的东西: gci -force "\\computername\c$\users\username\Documents" -recurse 或者 gci -force "\\co
document - Microsoft Document Explorer 2008有什么用途？
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, over
javascript - 什么是 `(function(document) { ... }(document));`
这个问题已经有答案了: What is the (function() { } )() construct in JavaScript? (28 个回答) 已关闭 6 年前。说实话，一开始我以为我可
javascript - document.getElementsByTagName ("*") 或 document.all
document.getElementsByTagName("*") 适用于 IE/Firefox/Opera，但不适用于 Chrome 和 Safari。 document.all 适用于 IE/C
javascript - Document 和 document 和有什么不一样？
这个问题在这里已经有了答案: What is the difference between Document and document in JavaScript? (2 个答案) 关闭 8 年前。
javascript - document.addEventListener 与 $(document).on
我以某种方式发现将事件监听器添加到文档的行为有点奇怪。虽然向 HTMLElements 添加监听器工作正常，但向文档添加监听器不起作用。但奇怪的是，使用 jQuery 可以让它工作。那么有人可以解释
javascript - document.documentElement 与 document.all
谁能告诉我这两个 JavaScript 命令之间的区别？这两个跨主要浏览器的兼容性是什么？我知道 documentElement 与大多数浏览器兼容。谢谢最佳答案 document.docume
javascript - document.all 与 document.getElementById
什么时候应该使用 document.all 与 document.getElementById？最佳答案 document.all 是 Microsoft 对 W3C 标准的专有扩展。 getEle
react-native-document-picker : Type error-->undefined document picker while calling document picker. 在 react native 0.61.2 中显示
当升级到 react-native 0.61.2 时，这个问题出现了。我做到了从手机中删除了 apk 和自动链接使用 react-native link 然后 react-native run-and
react-native-document-picker : Type error-->undefined document picker while calling document picker. 在 react native 0.61.2 中显示
当升级到 react-native 0.61.2 时，这个问题出现了。我做到了从手机中删除了 apk 和自动链接使用 react-native link 然后 react-native run-and
websocket - 如何将Vec 转换为bson::document::Document？
我将收到 tungstenite::Message ，它将包含来自客户端的bson文档。我可以将tungstenite::Message转换为Vec，但是如何在服务器端将其转换回 bson::docu
javascript - document 和 document.cookie 之间的范围差异
我这里有一个简单的疑问: 文档对象范围位于浏览器选项卡内:我的意思是如果我设置document.tab1 ='tab1' 在一个浏览器选项卡中它在其他选项卡中不可用。但是 document.coo
javascript - document.head, document.body 附加脚本
我经常使用并看到推荐的 dom 访问结构，例如这样动态地将内容添加到页面: loader = document.createElement('script'); loader.src = "myurl
jquery - 如何用 $(document).on() 替换 $(document).ready()
我对 JQuery 还很陌生。我正在使用this JQuery 函数在元素上显示工具提示。我根据我的需要(在这个社区的帮助下)以这种方式编辑了代码: $(document).ready(functi
javascript - document.ready 与 document.onLoad
我想知道哪个是运行js代码的正确方法，该代码根据窗口高度计算垂直菜单的高度并按时设置，不晚不早。我正在使用 document.ready 但它并没有真正帮助我解决这个问题，它有时没有设置，我必须重新
javascript document.all 和 document.all.id_name 代表
我正在浏览一个 js 文件并发现这个声明var dataobj=document.all? document.all.id_name : document.getElementById("id_nam
javascript - 何时使用 if (document.all&&document.getElementById) 条件
想知道何时使用，这适用于什么浏览器？ if (document.all&&document.getElementById) { // Some code block } 最佳答案我认为没有任何重要的

首页

博学

6Ren·AI

商城

javascript - PUPPETEER - 无法使用 page.evaluate(() => document.querySelectorAll()) 提取某些网站上的元素