gpt4 book ai didi

javascript - "Sanitize"RSS 变得人类可读

转载 作者:行者123 更新时间:2023-12-03 03:57:46 25 4
gpt4 key购买 nike

我有一个 RSS (XML) 文件,并希望将其转换为具有人类可读文本(无格式)的 JSON 文件。(也许“清理”不是正确的搜索词?)

XML 示例如下所示

<description>&lt;p&gt;&lt;strong&gt;&lt;img alt=&quot;&quot;
src=&quot;/site/sites/default/files/ReligionUN.png&quot;
style=&quot;width: 43px; height: 34px; float: left;&quot;
/&gt;June 20&lt;/strong&gt;&lt;br /&gt;
&amp;nbsp;&lt;/p&gt;
&lt;p&gt;The UN World Refugee Day was agreed upon in 2001 in
connection with the celebration of the Refugee Convention&amp;
#39;s fiftieth anniversary. The date was chosen because the
Organization of African Unity already celebrated Africa Refugee
Day on June 20.&amp;nbsp;&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;The Holiday Calendar is sponsered by:&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;&quot; p=&quot;&quot;
src=&quot;/site/sites/default/files/alle_logoer_800x600.png&quot;
style=&quot;width: 800px; height: 600px;&quot; /&gt;&lt;/p&gt;
</description>

我想要实现的目标如下:

"description": "December 18\nThe UN International Migrants' Day
marks the adoption of the International Migrant Workers Convention
on December 18, 1990.\nThe UN wished to emphasize that
transnational migration is a growing phenomenon, which can
contribute to growth and development across the world provided that
the international community assure migrants' rights.\n\nThe Holiday
Calendar is sponsered by:\n"

我需要清理 XML 或 JSON 上的文本(首选第一个)。使用以下代码:

const fs = require('fs')
const convert = require('xml-js')
const _ = require('lodash')
const striptags = require('striptags')

const xmlstr = fs.readFileSync('./english.xml', 'utf8')

const json_html = convert.xml2json(xmlstr, { compact: true, spaces: 4 })

const json_stripped = striptags(
_.replace(json_html, new RegExp('&nbsp;', 'g'), '')
)

fs.writeFileSync('./english.json', json_stripped)

我到目前为止

"description": "December 18\n\nThe UN International Migrants&#39; 
Day marks the adoption of the International Migrant Workers
Convention on December 18, 1990.\nThe UN wished to emphasize that
transnational migration is a growing phenomenon, which can
contribute to growth and development across the world provided that
the international community assure migrants&#39; rights.\nThe
Holiday Calendar is sponsered by:\n\n\n\n\n\n\n\n"

它几乎就在那里,但正如你所看到的,我仍然在努力寻找如何替换  ' 等内容并将多个换行符缩小到单行中断..

最佳答案

您想要对 html 进行转义/解码。有很多针对它的软件包。

喜欢this one

console.log(entities.decode('&lt;&gt;&quot;&apos;&amp;&copy;&reg;&#8710;')); // <>"'&&copy;&reg;∆ 

关于javascript - "Sanitize"RSS 变得人类可读,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44860350/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com