node.js - 使用 request()，返回的页面还不包含需要的数据——而是返回不完整的页面。我如何 'wait' ？-6ren

node.js - 使用 request()，返回的页面还不包含需要的数据——而是返回不完整的页面。我如何 'wait' ？

转载作者：行者123 更新时间：2023-12-04 12:00:10

30

4

我正在尝试从 carjam.co.nz 中提取年份、品牌、型号、颜色和车牌号。我正在抓取的 URL 示例是 https://www.carjam.co.nz/car/?plate=JKY242 .
如果最近请求了牌照，则响应将是包含车辆详细信息的 HTML 文档。

最近请求板详细信息的结果。
如果最近没有请求车牌详细信息(大多数车牌都是这种情况)，则响应是带有“尝试获取一些车辆数据”的 HTML 文档。我猜这个页面会在从数据库中获取信息时显示，然后重新加载页面以显示车辆详细信息。这似乎是在服务器端呈现的，我看不到任何 AJAX 请求。
每个结果的 URL 都相同。

最近未请求车辆的结果。
我如何“等待”正确的信息？
我正在使用 request (我知道已弃用，但这是我最习惯使用的)在带有 Express 服务器的 Node.js 上。
我的(非常减少的)代码:

app.get("/:numberPlate", (req, res) => {
  request("https://www.carjam.co.nz/car/?plate=" + req.params.numberPlate, function(error, response, body) {
    const $ = cheerio.load(body);
    res.status(200).send(JSON.stringify({
      year: $("[data-key=year_of_manufacture]").next().html(),
      make: toTitleCase($("[data-key=make]").next().html()),
      model: toTitleCase($("[data-key=model]").next().html()),
      colour: toTitleCase($("[data-key=main_colour]").next().html()),
  }));
  }
}

我考虑过:

发出请求并丢弃它，休眠 2 - 3 秒，然后发出第二个请求。这种方法的优点是每个请求都可以工作。缺点是每个请求需要 2 - 3 秒(太慢)。

发出请求并检查正文是否包含“试图获取一些车辆数据”。如果是这样，请休眠几秒钟，发出另一个请求并对第二个请求的结果采取行动(但如何？)。

我相信这是一个常见问题，答案很简单，但我没有足够的经验来自己解决这个问题，或者确切地知道谷歌是什么!

测试:新西兰有“ABC123”格式的数字位置——三个字母，三个数字。这些是按字母顺序发布的，目前我们没有超过 NLU999 的内容(不包括自定义号牌、乱序发行的号牌等)。
要重现“试图获取一些车辆数据”，您每次都需要找到一个新的车牌——序列中早于 NLU999 的大多数车牌应该可以工作。
此代码段应生成有效的车牌。

console.log(Math.random().toString(36).replace(/[^a-n]+/g, '').substr(0, 1).toUpperCase() + Math.random().toString(36).replace(/[^a-z]+/g, '').substr(0, 2).toUpperCase() + Math.floor(Math.random() * 10).toString() + Math.floor(Math.random() * 10).toString() + Math.floor(Math.random() * 10).toString());

2021 年 5 月 5 日更新
经过进一步思考，这个伪代码可能就是我所追求的——但不确定如何实际实现。

request(url) {
  if (url body contains "Trying to get some vehicle data") {
    wait(2 seconds)
    request(url again) {
      return second_result
    }
  } else {
    return first_result
  }
}
then
  process(first_result or second_result)

我的难处:我习惯了 request().then() 的格式，直接从请求中采取行动。
假设这种方法是正确的，我将如何进行以下操作？

发送请求，然后

评估响应，然后

传递此响应，或发送另一个请求然后将该回复传递给

处理响应

最佳答案

来自 this javascript file ，如果在最大重试次数设置为 10 的情况下找不到数据，则网站每 X 秒加载一次页面。此外，从 Refresh 中检索以秒为单位的刷新值。 http header 值。
您可以重现此流程，以便您具有与前端代码完全相同的行为。
在下面的例子中，我使用 axios

const axios = require("axios");
const cheerio = require("cheerio");

const rootUrl = "https://www.carjam.co.nz/car/";
const plate = "NLU975";
const maxRetry = 10;
const waitingString = "Waiting for a few more things";

async function getResult() {
  return axios.get(rootUrl, {
    params: {
      plate: plate,
    },
  });
}

async function processRetry(result) {
  const refreshSeconds = parseInt(result.headers["refresh"]);
  var retryCount = 0;
  while (retryCount < maxRetry) {
    console.log(
      `retry: ${retryCount} time, waiting for ${refreshSeconds} second(s)`
    );
    retryCount++;
    await timeout(refreshSeconds * 1000);
    result = await getResult();
    if (!result.data.includes(waitingString)) {
      break;
    }
  }
  return result;
}

(async () => {
  var result = await getResult();
  if (result.data.includes(waitingString)) {
    result = await processRetry(result);
  }
  const $ = cheerio.load(result.data);
  console.log({
    year: $("[data-key=year_of_manufacture]").next().html(),
    make: $("[data-key=make]").next().html(),
    model: $("[data-key=model]").next().html(),
    colour: $("[data-key=main_colour]").next().html(),
  });
})();

function timeout(ms) {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

复制链接: https://replit.com/@bertrandmartel/ScrapeCarJam
示例输出:

retry: 0 time, waiting for 1 second(s)
retry: 1 time, waiting for 1 second(s)
retry: 2 time, waiting for 1 second(s)
{ year: 'XXXX', make: 'XXXXXX', model: 'XX', colour: 'XXXX' }

它使用 async/await而不是 promise 。
请注意 request is deprecated

关于node.js - 使用 request()，返回的页面还不包含需要的数据——而是返回不完整的页面。我如何 'wait' ？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/67326029/

30

4

0

文章推荐： json - JSON Schema 是否支持深度对象验证？

文章推荐： html - Paypal Checkout 窗口在 WKWebView(iOS) 中立即关闭

文章推荐： c# - protobuf-net 版本 2.X 到 3.X 迁移

python - requests.request ('POST' 和 request.post 之间的区别
这两个句子有什么区别: res = requests.request('POST', url) 和 res = requests.request.post(url) 最佳答案它们几乎是一样的:htt
FaceBook API : Get the Request Object for a request Id - logged into the account that sent the request. 使用 "Requests Dialog"API
我正在使用“请求对话框”来创建 Facebook 请求。为了让用户收到请求，我需要使用图形 API 访问 Request 对象。我已经尝试了大多数看起来合适的权限设置(read_requests 和
python - http.client.HTTPConnection.request 与 urllib.request.Request
urllib.request和http.client都是python标准库。前者相关方法的文档是 here后者，here (我使用的是3.5) 有谁知道为什么标准库中有两种方法看起来做同样的事情，或者
Python 扭曲错误 : "Request.write called on a request after Request.finish was called"
我是 Twisted 的新手，我不明白为什么在运行我的脚本时会出现此错误。\ 基本上，该脚本由 2 个页面组成，第一个页面是一个 HTML 表单，它调用自身执行一个阻塞方法并显示结果。当请求同时发送到
javascript - request.body 与 request.params 与 request.query
我有一个客户端 JS 文件，其中包含: agent = require('superagent'); request = agent.get(url); 然后我有类似的东西 request.get(u
javascript - 在 Rails 应用程序中提前输入 : Append JSON request to only one specific request instead of appending JSON request to every request via prefetch
提前输入功能可以正常工作。但问题是，提前输入功能会在每个数据请求上发出 JSON 请求，而实际上只应针对一个特定请求发生。我有以下 Controller : #controllers/agencie
request - 如何在中间件和处理程序中读取 Iron Request？
我正在使用 Rust 开发一个小型 API，我不确定如何在两个地方访问来自 Iron 的 Request。 Authentication 中间件为 token 读取一次Request，如果路径被允许(
cnzz统计代码引起的Bad Request - Request Too Long的原因分析
问题起因今天一位网友向我们反馈，用Chrome打开某些博客文章时，会出现"Bad Request - Request Too Long. HTTP Error 400. The siz
java - 领英 OAuth : "signature_invalid" response when requesting a POST HTTP request (for request token)
当我从 LinkedIn 向 https://api.linkedin.com/uas/oauth/requestToken 请求请求 token 时，出现以下错误: oauth_problem=si
android - Request(okhttp3.Request.Builder) 在 okhttp3.Request 中有私有(private)访问权限
我只是想使用 okhttp 下载一些字节数据，但在我完成代码之前，我遇到了一个问题，android studio 报告了一个错误，说“Request(okhttp3.Request.Builder)
node.js - 如何修复 Windows 10 中的 "npm WARN deprecated request@2.88.2: request has been deprecated, see https://github.com/request/request/issues/3142"错误？
我正在使用 Windows 10。我想在我的系统上使用 Angular 4。当我运行 node -v 和 npm -v 时，它会显示版本。但是当我执行语句 npm install -g @angula
rust - 无法编译 Iron 示例 : expected struct `iron::request::Request` , 找到结构 `iron::Request`
我正在尝试让一个简单的 Iron 示例起作用: extern crate iron; extern crate router; use iron::prelude::*; use iron::stat
python - Flask request.form 包含数据，但 request.data 为空且 request.get_json() 返回错误
我正在尝试使用嵌套字典“动态”创建一个数据输入表单(目前，我使用具有 3 个值的数组，但将来数组中的元素数量可能会有所不同)。这似乎工作正常，并且表单“正确”渲染了 html 模板(正确 = 我看到了
ASP.NET:使用 Request ["param"] 与使用 Request.QueryString ["param"] 或 Request.Form ["param"]
从 ASP.NET 中的代码隐藏访问表单或查询字符串值时，使用的优缺点是什么，例如: // short way string p = Request["param"]; 代替: // long way
ios - 如何处理这个 : There are five api requests running parallelly and 2nd request is dependent on 4th request's response
我遇到了一个问题，我想知道更好的解决方法。有五个 api 请求并行运行，第二个请求依赖于第四个请求的响应，但所有 5 个请求都已在运行。什么是更好的方法？需要建议。提前致谢。最佳答案调度地面工
python - urllib.request.Request 说参数无效
我收到以下错误:TypeError:序列项 0:预期字节、字节数组或具有缓冲区接口(interface)的对象、找到元组我检查了Python文档，urllib.request.Request的参数似
python - urllib.request.Request 超时参数错误
当我向函数添加超时参数时，我的代码总是进入异常并打印出“我失败了”。当我删除超时参数时，代码会正常工作，并进入 try 子句。关于超时参数如何在 urllib.request 函数中工作的任何信息？
php - preg_match html代码
我使用 cURL 向服务器发送请求这是链接:Server Side script for cURL request我用 file_get_contents('php://input'); 读取发送的数
java - org.apache.solr.common.SolrException : Bad Request Bad Request request: http://localhost:8080/solr/update? wt=javabin&version=2
请大家帮帮我我正在尝试使用 NUTCH 抓取网站，但它给我错误“java.io.IOException: Job failed!” 我正在运行此命令“bin/nutch solrindex http:
AngularJS 错误 : Unexpected request (No more requests expected)
在我的 AngularJS 应用程序中，我无法弄清楚如何对 then promise 的执行更改 location.url 进行单元测试。我有一个函数，登录，调用服务，身份验证服务 .它返回 pro

首页

博学

6Ren·AI

商城

node.js - 使用 request()，返回的页面还不包含需要的数据——而是返回不完整的页面。我如何 'wait' ？