dom - 如何在 Chrome / Chrome headless 上倾倒超过 <body> ？-6ren

dom - 如何在 Chrome / Chrome headless 上倾倒超过？

转载作者：行者123 更新时间：2023-12-04 12:17:07

25

4

Chrome 的文档说明:

The --dump-dom flag prints document.body.innerHTML to stdout:

根据标题，如何使用 Chromium headless 转储更多的 DOM 对象(最好是全部)？我可以通过开发人员工具手动保存整个 DOM，但我想要一个程序化的解决方案。

最佳答案

更新 2019-04-23 谷歌在 headless 方面非常活跃，发生了许多更新

下面的答案适用于 v62 当前版本是 v73，并且它一直在更新。
https://www.chromestatus.com/features/schedule

我强烈建议检查 puppeteer 是否有 headless chrome 的任何 future 发展。它由 Google 维护，并与 npm package 一起安装所需的 Chrome 版本。因此，您只需使用文档中的 puppeteer API，而不必担心 Chrome 版本并设置 headless Chrome 和开发工具 API 之间的连接，这可以实现 99% 的魔法。

repo :https://github.com/GoogleChrome/puppeteer

文档:https://pptr.dev/

更新 2017-10-29 Chrome 已经有 --dump-html 标志，它返回完整的 HTML，而不仅仅是正文。

v62 确实有它，它已经在稳定 channel 上。

修复此问题的问题: https://bugs.chromium.org/p/chromium/issues/detail?id=752747

当前 chrome 状态(每个 channel 的版本) https://www.chromestatus.com/features/schedule

为遗产留下旧答案

You can do it with google chrome remote interface. I have tried it and wasted couple hours trying to launch chrome and get full html, including title and it is just not ready yet, i would say.

It works sometimes but i've tried to run it in production environment and got errors time to time. All kind of random errors like connection reset and no chrome found to kill. Those errors rised up sometimes and it's hard to debug.

I personally use --dump-dom to get html when i need body and when i need title i just use curl for now. Of course chrome can give you title from SPA applications, which can not be done with only curl if title is set from JS. Will switch to google chrome after having stable solution.

Would love to have --dump-html flag on chrome and just get all html. If Google's engineer is reading this, please add such flag to chrome.

I've created issue on Chrome issue tracker, please click favorite "star" to get noticed by google developers:

https://bugs.chromium.org/p/chromium/issues/detail?id=752747

Here is a long list of all kind of flags for chrome, not sure if it's full and all flags: https://peter.sh/experiments/chromium-command-line-switches/ nothing to dump title tag.

This code is from Google's blog post, you can try your luck with this:
const CDP = require('chrome-remote-interface');

...

(async function() {

const chrome = await launchChrome();
const protocol = await CDP({port: chrome.port});

// Extract the DevTools protocol domains we need and enable them.
// See API docs: https://chromedevtools.github.io/devtools-protocol/
const {Page, Runtime} = protocol;
await Promise.all([Page.enable(), Runtime.enable()]);

Page.navigate({url: 'https://www.chromestatus.com/'});

// Wait for window.onload before doing stuff.
Page.loadEventFired(async () => {
  const js = "document.querySelector('title').textContent";
  // Evaluate the JS expression in the page.
  const result = await Runtime.evaluate({expression: js});

  console.log('Title of page: ' + result.result.value);

  protocol.close();
  chrome.kill(); // Kill Chrome.
});

})();
Source: https://developers.google.com/web/updates/2017/04/headless-chrome

关于dom - 如何在 Chrome / Chrome headless 上倾倒超过 <body> ？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44851729/

25

4

0

文章推荐： r - 在数据框的某些列上使用 t.test 和 apply

文章推荐： sas - 时期在这里引起什么样的比较？

文章推荐： assembly - RESB、RESW、RESD、RESQ在NASM中分配了多少字节？

文章推荐： r - 按月计算的观察次数/频率

java - Firefox headless (headless)在 headless (headless) Redhat Linux 机器上崩溃
我正在尝试使用 firefox headless 在 headless (headless) redhat linux 构建机器上运行 selenium 测试。我创建驱动程序的方法如下所示: priv
python - 带 Selenium 的 headless (headless) Chrome ，只能找到滚动非 headless (headless)的方法
关于这个主题有很多东西可以找到，但无法弄清楚。我需要滚动到(不太长)无限滚动页面的末尾。我有 2 个选项可以使用 chrome 非 headless (headless)但似乎不能 headless
headless - 如何正确编译 ParaView 以进行 headless (headless)离屏渲染？
我在远程服务器上运行 OpenFOAM，基本上设法通过 paraview 的 pvserver 可视化结果 as described here .然而，在连接后，客户端产生 Server DISPLA
headless (headless)模式下的Android屏幕截图很难看
我想在 headless 模式下截取 Android 设备的屏幕截图，也就是说我是这样创建的: echo no | /opt/android/android-sdk-linux/tools/andro
selenium - headless (headless)浏览器在自动化方面的差异
主要区别在于，基于GUI和非GUI（Headless）执行。我正在寻找所有Headless浏览器之间的差异，但是很遗憾，我没有找到任何差异。我一个接一个地经历，这使我更加困惑。如果有人可以分享具有差
linux - OpenOffice headless (headless)
已关闭。这个问题是 off-topic 。目前不接受答案。想要改进这个问题吗？ Update the question所以它是on-topic用于堆栈溢出。已关闭10 年前。 Improve th
Java headless (headless)双三次图像调整大小
我需要在没有 X 服务器的情况下执行 java 图像裁剪和调整大小。我尝试了几种方法。下面的第一种方法有效，但输出了一个相当难看的调整大小的图像(可能使用最近邻算法来调整大小: static Buf
Python - Firefox headless (headless)
过去几天我一直在使用 Selenium、Tor 和 Firefox 作为多个任务的组合。我已经设法用 Python 编写了一个简单的脚本，它通过 Selenium 控制 Firefox，而 Firef
python - Pygame headless (headless)设置
我正在使用 pygame 的操纵杆 api 在 headless (headless)系统上对我的项目使用操纵杆，但是 pygame 需要一个“屏幕”，所以我设置了一个虚拟视频系统来克服这个问题。它工
firefox - 截图后不存在 headless (headless)Firefox
我想使用 headless firefox 在 macos 上捕获网页的图像。这是我执行的命令:/Applications/Firefox.app/Contents/MacOS/firefox-bi
php - headless (headless) Chromium 浏览器始终显示验证码
我正在使用带有 headless-chromium-php 的 google chrome headless (headless)浏览器导航到某些网站，但它总是被验证码检测到我尝试使用此 plug
terminal - 使用 Octave headless (headless)
是否有可能使用 Octave headless。像这样的东西 octave result.txt 最佳答案使用 octave --silent --eval 5+4 > result.txt 你会
android - 在 headless (headless)模式下获取本地化字符串
我目前正在尝试在 headless (headless)模式下运行应用程序，我定义了后台回调: void callbackInBackground() { // Invoked from the s
opengl - 在 headless (headless)LibGDX单元测试中创建纹理
我正在使用LibGDX headless backend运行jUnit测试。这在某些测试中效果很好，但是如果我尝试创建new Texture('myTexture.png');，则会收到NullPoi
Selenium 只能在非 headless (headless)模式下工作吗？
我想在这个页面上使用 Selenium:https://www.avis.com/en/home 如果没有 headless (headless)模式，该代码一切正常: import requests
jasmine - Jasmine headless (headless)Webkit中更好的故障报告
在Jasmine headless (headless)Webkit中运行测试时，我遇到了一个简单的TypeError: 'undefined' is not an object失败。但是没有提示在哪
firefox - 真正的 headless (headless)浏览器
我负责测试一个大量使用 AJAX 的企业 Web 应用程序。我需要构建一个系统，允许在没有人工干预的情况下连续运行测试。目前我最感兴趣的是负载测试，但我希望用于生成负载的相同脚本用于功能测试。目前用
docker - headless Docker主机与 headless 容器
TL; DR:我可以配置一个容器来原生访问VGA，以覆盖主机视频输出吗？我正在考虑处置低功耗的XenServer(以前为ESXi)白盒以设置docker最小安装(例如CoreOS，RancherOs
docker - headless (headless) Protractor 不分片测试
我正在尝试 headless (headless)运行我的测试，并将我的两个测试套件分片以并行运行它们。在我的本地计算机上，它们并行运行，但在这种 headless (headless)设置中，它们一
eclipse - 是否有一种 headless (headless)方式来导入项目并刷新工作区？
仍在尝试为大型大学项目(RCP 产品)建立 headless (headless)构建。每个 Eclipse 用户都知道以下手动功能:“文件 --> 导入 --> 将现有项目导入工作区”以及“构建工

首页

博学

6Ren·AI

商城