javascript - Node.js 提取标签之间的 html 元素-6ren

javascript - Node.js 提取标签之间的 html 元素

转载作者：行者123 更新时间：2023-12-03 01:10:34

26

4

假设我有一个带有 html 源代码的网站，其结构如下:

<html>
<head>
....

<table id="xxx">
 <tr>

..
</table>

我已经应用了该库来消除所有 html 标签。您能告诉我用什么库或正则表达式从 html 源中提取以 <table> 开头的所有文本吗？ ...并以 </table> 结尾

使用node.js？

下面是我的代码

console.log('todo list RESTful API server started on: ' + port);


var request = require('request');
var cheerio = require('cheerio');

request('https://fpp.mpfa.org.hk/tc_chi/mpp_list.jsp', function (error, response, body) {
  console.log('error:', error); // Print the error if one occurred
  console.log('statusCode:', response && response.statusCode); // Print the response status code if a response was received
   var sanitizeHtml = require('sanitize-html');
   var dirty = body.match(/\[(.*)\]/).pop();

var clean = sanitizeHtml(dirty, {
  allowedTags: [  ],
  allowedAttributes: {

  },
  allowedIframeHostnames: ['www.youtube.com']
});

  console.log('body:', clean); // Print the HTML for the Google homepage.  
});

最佳答案

您只需要使用cheerio的API即可获取<table>然后打印出文本 Node 。

给定页面的以下 HTML:

<!DOCTYPE html>

<html lang="en">

<head>
    <title>Contacts</title>
</head>

<body>
    <main>
        <h1>Hello</h1>
        <section>
            <h2>World</h2>
            <table>
                <tr>
                    <td>foo</td>
                    <td>bar</td>
                    <td>fizz</td>
                </tr>
                <tr>
                    <td>buzz</td>
                    <td>hello</td>
                    <td>world</td>
                </tr>
            </table>
        </section>
    </main>
</body>

</html>

并运行以下代码:

const request = require("request");
const cheerio = require("cheerio");
const URL_TO_PARSE = "http://localhost/my-page.html";

// Make a request to get the HTML of the page
request(URL_TO_PARSE, (err, response, body) => {
    if (err) throw new Error("Something went wrong");
    // Load the HTML into cheerio's DOM
    const $ = cheerio.load(body);
    // Print the text nodes of the <table> in the HTML
    console.log($("table").text());
});

将产生以下输出:

            foo
            bar
            fizz


            buzz
            hello
            world

然后您可以按照自己的喜好操作它。 Cheerio 使用与 jQuery 非常相似的 API。

关于javascript - Node.js 提取标签之间的 html 元素，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52226103/

26

4

0

文章推荐： javascript - 从另一个函数内的回调函数返回值

文章推荐： elasticsearch - elasticsearch aggs唯一的IP地址

首页

博学

6Ren·AI

商城

javascript - Node.js 提取标签之间的 html 元素

标签)？
根据 Web 标准，创建带有标题 1 的链接的正确代码是什么？是吗 stackoverflow 或 stackoverflow 谢谢最佳答案根据网络标准，您不能将 block 元素放入内

首页

博学

6Ren·AI

商城

javascript - Node.js 提取标签之间的 html 元素

标签)？ 根据 Web 标准，创建带有标题 1 的链接的正确代码是什么？ 是吗 stackoverflow 或 stackoverflow 谢谢 最佳答案 根据网络标准，您不能将 block 元素放入内

标签)？
根据 Web 标准，创建带有标题 1 的链接的正确代码是什么？是吗 stackoverflow 或 stackoverflow 谢谢最佳答案根据网络标准，您不能将 block 元素放入内