gpt4 book ai didi

html - 如何在Nodejs中读取和解析html?

转载 作者:太空宇宙 更新时间:2023-11-03 22:58:13 24 4
gpt4 key购买 nike

我有一个简单的项目。我需要帮助,这是一个相关项目。我需要读取 HTML 文件,然后将其转换为 JSON 格式。我想以代码和文本形式获取匹配项。我如何实现这一目标?

这样我就有了两个HTML标签

<p>In practice, it is usually a bad idea to modify global variables inside the function scope since it often is the cause of confusion and weird errors that are hard to debug.<br />
If you want to modify a global variable via a function, it is recommended to pass it as an argument and reassign the return-value.<br />
For example:</p>

<pre><code class="{python} language-{python}">a_var = 2

def a_func(some_var):
return 2**3

a_var = a_func(a_var)
print(a_var)
</code></pre>

我的代码:

const fs = require('fs')
const showdown = require('showdown')

var read = fs.readFileSync('./test.md', 'utf8')

function importer(mdFile) {

var result = []
let json = {}

var converter = new showdown.Converter()
var text = mdFile
var html = converter.makeHtml(text);

for (var i = 0; i < html.length; i++) {
htmlRead = html[i]
if(html == html.match(/<p>(.*?)<\/p>/g))
json.text = html.match(/<p>(.*?)<\/p>/g)

if(html == html.match(/<pre>(.*?)<\/pre>/g))
json.code = html.match(/<pre>(.*?)<\/pre>/g

}

return html
}
console.log(importer(read))

如何在代码上获取这些匹配项?

新代码:我将所有p标签写入同一个json中,如何将每个p标签写入不同的json block 中?

$('html').each(function(){
if ($('p').text != undefined) {
json.code = $('p').text()
json.language = "Text"
}
})

最佳答案

我建议使用 Cheerio。它尝试将 jQuery 功能实现到 Node.js。

const cheerio = require('cheerio')

var html = "<p>In practice, it is usually a bad idea to modify global variables inside the function scope since it often be the cause of confusion and weird errors that are hard to debug.<br />If you want to modify a global variable via a function, it is recommended to pass it as an argument and reassign the return-value.<br />For example:</p>"

const $ = cheerio.load(html)
var paragraph = $('p').html(); //Contents of paragraph. You can manipulate this in any other way you like

//...You would do the same for any other element you require

您应该查看Cheerio并阅读其文档。我觉得它真的很整洁!

Edit: for the new part of your question

您可以迭代每个元素并将其插入到 JSON 对象数组中,如下所示:

var jsonObject = []; //An array of JSON objects that will hold everything
$('p').each(function() { //Loop for each paragraph
//Now let's take the content of the paragraph and put it into a json object
jsonObject.push({"paragraph":$(this).html()}); //Add data to the main jsonObject
});

因此生成的 JSON 对象数组应如下所示:

[
{
"paragraph": "text"
},
{
"paragraph": "text 2"
},
{
"paragraph": "text 3"
}
]

我相信您还应该阅读JSON以及它是如何工作的。

关于html - 如何在Nodejs中读取和解析html?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54136046/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com