gpt4 book ai didi

python - 如何将特定html标签中的数据转换为字典

转载 作者:太空宇宙 更新时间:2023-11-04 04:49:38 27 4
gpt4 key购买 nike

我正在尝试使用以下代码和平来废弃 google 图片页面。

# -*- coding: utf-8 -*-
import urllib2
from bs4 import BeautifulSoup

site= "https://www.google.co.in/search?q=batman+wallpaper+hd&source=lnms&tbm=isch"

hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive'}


req = urllib2.Request(site,headers=hdr)

page = urllib2.urlopen(req)

soup = BeautifulSoup(page, 'html.parser')

for child in soup.find("div", {"data-ri":"16"}).children:
print child

得到这个输出

<a class="rg_l" href="#" jsaction="fire.ivg_o;mouseover:str.hmov;mouseout:str.hmou" jsname="hSRGPd" rel="noopener" style="background:rgb(11,18,24)"><img alt="Image result for batman wallpaper hd" class="rg_ic rg_i" jsaction="load:str.tbn" name="NCsi46a6Dm2_HM:" onload="typeof google==='object'&amp;&amp;google.aft&amp;&amp;google.aft(this)"/><div class="_aOd rg_ilm"><div class="rg_ilmbg"><span class="rg_ilmn"> 2880 × 1800 - wallpapertag.com </span></div></div></a>
<div class="rg_meta notranslate" jsname="ik8THc">{"id":"NCsi46a6Dm2_HM:","isu":"wallpapertag.com","itg":0,"ity":"jpg","oh":1800,"ou":"https://wallpapertag.com/wallpaper/full/b/8/5/84668-vertical-batman-wallpaper-hd-2880x1800-full-hd.jpg","ow":2880,"pt":"Batman wallpaper HD ·① Download free High Resolution wallpapers ...","rid":"vLHnAF3_eWR-KM","rmt":0,"rt":0,"ru":"https://wallpapertag.com/batman-wallpaper-hd","s":"2880x1800 Batman Wallpapers - HD Wallpapers Inn","st":"Wallpapertag.com","th":177,"tu":"https://encrypted-tbn0.gstatic.com/images?q\u003dtbn:ANd9GcSAIP3lqGZ0a2wkgqIecGZtCEMKAx8Qk5lp89FaV6ovmygejjf1YA","tw":284}</div>

我想读取链接到墙纸的“ou”标签的值,有人可以帮我解析变量中的链接吗? python初学者。提前致谢。

最佳答案

您可以使用 json 解析器,检查这段代码,它将只打印 ou 变量值:

from bs4 import BeautifulSoup
import json

html = '<div><div class="rg_meta notranslate" jsname="ik8THc">{"id":"NCsi46a6Dm2_HM:","isu":"wallpapertag.com","itg":0,"ity":"jpg","oh":1800,"ou":"https://wallpapertag.com/wallpaper/full/b/8/5/84668-vertical-batman-wallpaper-hd-2880x1800-full-hd.jpg","ow":2880,"pt":"Batman wallpaper HD ·① Download free High Resolution wallpapers ...","rid":"vLHnAF3_eWR-KM","rmt":0,"rt":0,"ru":"https://wallpapertag.com/batman-wallpaper-hd","s":"2880x1800 Batman Wallpapers - HD Wallpapers Inn","st":"Wallpapertag.com","th":177,"tu":"https://encrypted-tbn0.gstatic.com/images?q\u003dtbn:ANd9GcSAIP3lqGZ0a2wkgqIecGZtCEMKAx8Qk5lp89FaV6ovmygejjf1YA","tw":284}</div></div>'

soup = BeautifulSoup(html, "html.parser")

for child in soup.find("div").children:
if child.name == 'div':
data_content = json.loads(child.text)
print(data_content["ou"])

关于python - 如何将特定html标签中的数据转换为字典,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48665960/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com