gpt4 book ai didi

javascript - 尽管单击了链接,但 Puppeteer 实际上并没有下载 ZIP

转载 作者:行者123 更新时间:2023-12-05 05:31:08 24 4
gpt4 key购买 nike

我一直在取得渐进式进展,但在这一点上我很困惑。

这是我要从 https://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp 下载的网站我使用 Puppeteer 的原因是因为我找不到一个支持的 API 来获取这些数据(如果有人愿意尝试的话)链接是“下载原始数据”

我的脚本运行到最后,但似乎并没有真正下载任何文件。我尝试安装 puppeteer-extra 并设置下载路径:

const puppeteer = require("puppeteer-extra");
const { executablePath } = require('puppeteer')

...

var dir = "/home/ubuntu/AirlineStatsFetcher/downloads";
console.log('dir to set for downloads', dir);
puppeteer.use(require('puppeteer-extra-plugin-user-preferences')
(
{
userPrefs: {
download: {
prompt_for_download: false,
open_pdf_in_system_reader: true,
default_directory: dir,
},
plugins: {
always_open_pdf_externally: true
},
}
}));

const browser = await puppeteer.launch({
headless: true, slowMo: 100, executablePath: executablePath()
});

...
// Doesn't seem to work
await page.waitForSelector('table > tbody > tr > .finePrint:nth-child(3) > a:nth-child(2)');
console.log('Clicking on link to download CSV');
await page.click('table > tbody > tr > .finePrint:nth-child(3) > a:nth-child(2)');

一段时间后,我想出了为什么不尝试构建完整的 URL,然后执行 GET 请求,但随后我遇到了其他问题 (UNABLE_TO_VERIFY_LEAF_SIGNATURE)。在走这条路之前(感觉有点老套)我想在这里征求意见。

在下载配置方面我是否遗漏了什么?

最佳答案

使用 puppeteer 下载文件似乎是一个移动目标 btw not well supported today .现在 (puppeteer 19.2.2) 我会选择 https.get相反。

"use strict";

const fs = require("fs");
const https = require("https");
// Not sure why puppeteer-extra is used... maybe https://stackoverflow.com/a/73869616/1258111 solves the need in future.
const puppeteer = require("puppeteer-extra");
const { executablePath } = require("puppeteer");

(async () => {
puppeteer.use(
require("puppeteer-extra-plugin-user-preferences")({
userPrefs: {
download: {
prompt_for_download: false,
open_pdf_in_system_reader: false,
},
plugins: {
always_open_pdf_externally: false,
},
},
})
);

const browser = await puppeteer.launch({
headless: true,
slowMo: 100,
executablePath: executablePath(),
});

const page = await browser.newPage();
await page.goto(
"https://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp ",
{
waitUntil: "networkidle2",
}
);

const handle = await page.$(
"table > tbody > tr > .finePrint:nth-child(3) > a:nth-child(2)"
);

const relativeZipUrl = await page.evaluate(
(anchor) => anchor.getAttribute("href"),
handle
);

const url = "https://www.transtats.bts.gov/OT_Delay/".concat(relativeZipUrl);
const encodedUrl = encodeURI(url);

//Don't use in production
https.globalAgent.options.rejectUnauthorized = false;

https.get(encodedUrl, (res) => {
const path = `${__dirname}/download.zip`;
const filePath = fs.createWriteStream(path);
res.pipe(filePath);
filePath.on("finish", () => {
filePath.close();
console.log("Download Completed");
});
});

await browser.close();
})();

关于javascript - 尽管单击了链接,但 Puppeteer 实际上并没有下载 ZIP,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74424735/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com