gpt4 book ai didi

python - json 值是一个 html 字符串 - 如何在 python 中解析它?

转载 作者:行者123 更新时间:2023-12-01 04:55:20 26 4
gpt4 key购买 nike

我有一个这样的 JSON 文件:

{
"entryLabel": "cat",
"entryContent": "<div class=\"entry_container\"><div class=\"entry lang_en-gb\" id=\"cat_1\"><span class=\"inline\"><h1 class=\"hwd\">cat<\/h1><span> [<\/span><span class=\"pron\" type=\"\">ˈkæt<a href=\"#\" class=\"playback\"><img src=\"https://api.collinsdictionary.com/external/images/redspeaker.gif?version=2013-10-30-1535\" alt=\"Pronunciation for cat\" class=\"sound\" title=\"Pronunciation for cat\" style=\"cursor: pointer\"/><\/a><audio type=\"pronunciation\" title=\"cat\"><source type=\"audio/mpeg\" src=\"https://api.collinsdictionary.com/media/sounds/sounds/0/081/08189/08189.mp3\"/>Your browser does not support HTML5 audio.<\/audio><\/span><span>]<\/span><\/span><div class=\"hom\" id=\"cat_1.1\"><span> <\/span><span class=\"gramGrp\"><span class=\"pos\">noun<\/span><\/span><div class=\"sense\"><span> <\/span><span class=\"bold\">1 <\/span><span class=\"lbl\"><span>(<\/span>domestic<span>)<\/span><\/span><span> <\/span><span class=\"cit lang_fr\"><span class=\"quote\">chat <em class=\"hi\">m<\/em><\/span><\/span><span class=\"cit\" id=\"cat_1.2\"><span>; <\/span><span class=\"quote\">Have you got a cat?<\/span><span> <\/span><span class=\"cit lang_fr\"><span class=\"quote\">Est-ce que tu as un chat?<\/span><\/span><\/span><span class=\"re\" id=\"cat_1.3\"><span>; <\/span><span class=\"inline\"><span class=\"orth\">to let the cat out of the bag<\/span><\/span><div class=\"sense\"><span> <\/span><span class=\"cit lang_fr\"><span class=\"quote\">vendre la mèche<\/span><\/span><\/div><!-- End of DIV sense--><\/span><span class=\"re\" id=\"cat_1.4\"><span>; <\/span><span class=\"inline\"><span class=\"orth\">curiosity killed the cat<\/span><\/span><div class=\"sense\"><span> <\/span><span class=\"cit lang_fr\"><span class=\"quote\">la curiosité est toujours punie<\/span><\/span><\/div><!-- End of DIV sense--><\/span><span class=\"re\" id=\"cat_1.5\"><span>; <\/span><span class=\"inline\"><span class=\"orth\">to look like sth the cat dragged in<\/span><\/span><span class=\"inline\"><span>, <\/span><span class=\"orth\">to look like sth the cat brought in<\/span><\/span><div class=\"sense\"><span> <\/span><span class=\"cit lang_fr\"><span class=\"quote\">être dans un état lamentable<\/span><\/span><\/div><!-- End of DIV sense--><\/span><span class=\"re\" id=\"cat_1.6\"><span>; <\/span><span class=\"inline\"><span class=\"orth\">to play cat and mouse with sb<\/span><\/span><span class=\"inline\"><span>, <\/span><span class=\"orth\">to play a game of cat and mouse with sb<\/span><\/span><div class=\"sense\"><span> <\/span><span class=\"cit lang_fr\"><span class=\"quote\">jouer au chat et à la souris avec qn<\/span><\/span><\/div><!-- End of DIV sense--><\/span><span class=\"re\" id=\"cat_1.7\"><span>; <\/span><span class=\"inline\"><span class=\"orth\">to put the cat among the pigeons<\/span><\/span><span class=\"inline\"><span>, <\/span><span class=\"orth\">to set the cat among the pigeons<\/span><\/span><span class=\"lbl\"><span> (<\/span>British<span>)<\/span><\/span><div class=\"sense\"><span> <\/span><span class=\"cit lang_fr\"><span class=\"quote\">jeter un pavé dans la mare<\/span><\/span><\/div><!-- End of DIV sense--><\/span><span class=\"re\" id=\"cat_1.8\"><span>; <\/span><span class=\"inline\"><span class=\"orth\">there's no room to swing a cat<\/span><\/span><div class=\"sense\"><span> <\/span><span class=\"cit lang_fr\"><span class=\"quote\">on ne peut pas se tourner<\/span><\/span><\/div><!-- End of DIV sense--><\/span><\/div><!-- End of DIV sense--><div class=\"sense\"><span> <br/><\/span><span class=\"bold\">2 <\/span><span class=\"lbl\"><span>(= <\/span>big cat<span>)<\/span><\/span><span> <\/span><span class=\"cit lang_fr\"><span class=\"quote\">félin <em class=\"hi\">m<\/em><\/span><\/span><span class=\"cit\" id=\"cat_1.9\"><span>; <\/span><\/span><\/div><!-- End of DIV sense--><\/div><!-- End of DIV hom--><\/div><!-- End of DIV entry lang_en-gb--><\/div><!-- End of DIV entry_container-->\n"
}

我需要解析这个 JSON 文件,但数据 "entryContent"该值是一个 HTML 字符串。我可以转换初始 JSON 文件的结构或直接解析 HTML 字符串吗?我需要一些建议。

现在我只有这段代码:

import json
from pprint import pprint
json_data=open('cat.json')

data = json.load(json_data)
#pprint(data)

print data["dictionaryCode"]
print data["entryLabel"]
print data["entryContent"]

json_data.close()

最后,我需要从 HTML 中获取此跨度的值 <span class="pron" type="">ˈkæt</span> ;源元素的 src 值 <source type="audio/mpeg" src="https://api.collinsdictionary.com/media/sounds/sounds/0/081/08189/08189.mp3"/> ;span 类 pos 的值 <span class="gramGrp"><span class="pos">noun</span></span> ;以及 div 元素提供的所有感官

<div class="sense">
<span> <br/></span>
<span class="bold">2 </span><span class="lbl"><span>(= </span>big cat<span>)</span></span><span> </span><span class="cit lang_fr"><span class="quote">félin <em class="hi">m</em></span></span><span class="cit" id="cat_1.9"><span>; </span></span>
</div>

最佳答案

尝试使用BeautifulSoup :

import json
from bs4 import BeautifulSoup

# json_data=open('cat.json')
# data = json.load(json_data)
# using json.load and the 'with' context (to close file when not needed...)
with open('cat.json') as f:
json_data = json.load(f)

print data["dictionaryCode"]
print data["entryLabel"]
entryContentHTML = BeautifulSoup(data["entryContent"])
print entryContentHTML.prettify()

# json_data.close()

关于python - json 值是一个 html 字符串 - 如何在 python 中解析它?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27567519/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com