gpt4 book ai didi

javascript - 抓取 NodeJS 时出现分页问题

转载 作者:行者123 更新时间:2023-11-30 19:02:38 25 4
gpt4 key购买 nike

我正在编写一个小脚本,从公共(public)目录中抓取一些信息。我已将其保存为 CSV,但我在自动分页时遇到了问题。

我的来源是:

const rp = require('request-promise');
const request = ('request');
const otcsv = require('objects-to-csv');
const cheerio = require('cheerio');

// URL To scrape
const baseURL = 'xx';
const searchURL = 'xxx';

// scrape info
const getCompanies = async () => {
// Pagination test

for(let index = 0; index <= 2; index = index + 1) {
const html = await request.get("xxx" + index);
const $ = await cheerio.load(html);
console.log("Loading Pages....");
// console.log("At page number" + index);
// end pagination test
const htmls = await rp(baseURL + searchURL);
const businessMap = cheerio('a.business-name', htmls).map(async (i, e) => {
const link = baseURL + e.attribs.href;
const innerHtml = await rp(link);
const emailAddress = cheerio('a.email-business', innerHtml).prop('href');
const name = e.children[0].data || cheerio('h1', innerHtml).text();
const phone = cheerio('p.phone', innerHtml).text();

return {
emailAddress: emailAddress ? emailAddress.replace('mailto:', '') : '',
// link,
name,
phone,
}

}).get();
return Promise.all(businessMap);
}
};

// save to CSV
getCompanies()
.then(result => {
const transformed = new otcsv(result);
return transformed.toDisk('./output.csv');
})
.then(() => console.log('SUCCESSFULLY COMPLETED THE WEB SCRAPING SAMPLE'));

出现的错误是request.get 不是函数。

编辑

此问题的第二部分位于此处:Nodejs Scraper isn't moving to next page(s)

最佳答案

request.get 应该是 rp.get,因为 request 模块不返回 Promise

在任何情况下你都会得到错误,因为你不是requireing request,而只是分配一个string请求变量:

const request = ('request');

将其更改为:

const request = require('request');

由于您使用的是 Promises,我建议只要求 request-promise

const request = require('request-promise');

关于javascript - 抓取 NodeJS 时出现分页问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59347996/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com