gpt4 book ai didi

Python:用xpath提取后替换html img src

转载 作者:行者123 更新时间:2023-11-28 01:25:20 27 4
gpt4 key购买 nike

我从这个 site 中提取了一些 html 代码,现在我可以看到我抓取的所有代码,除了图像,因为它们的 src 不正确。

#!C:/Python27/python
from lxml import etree
import requests

q = "http://www.dlib.org/dlib/november14/giannakopoulos/11giannakopoulos.html"
page = requests.get(q)
tree = etree.HTML(page.text)
element = tree.xpath('./body/form/table[3]/tr/td/table[5]')
content = etree.tostring(element[0])
print "Content-type: text\n\n"
print content.strip()

现在我读取了正确的 img src(我想要的)并将其放入一个数组中:

pic=[]
link = q.rsplit("/",1)
images = tree.xpath("//img/@src")
for i in images:
if i.find('.gif') == -1:
pic.append(link[0]+"/"+i)

如何用数组中的 src 替换抓取的 src?

最佳答案

我很确定这就是您要找的。

link = q.rsplit("/",1)
images = tree.xpath("//img")

for idx, image in enumerate(images):
if '.gif' not in image.attrib['src']:
images[idx].attrib['src'] = link[0]+'/'+image.attrib['src']

for image in images:
print image.attrib['src']

它循环遍历每个选定的图像,如果 '.gif'不在图像中 src属性,它会更新 src属性添加到您指定的 PNG/JPG 路径。

输出

../../../img2/space.gif
../../../img2/search2.gif
../../../img2/space.gif
../../../img2/D-Lib-blocks.gif
../../../img2/transparent.gif
../../../img2/magazine.gif
../../../img2/transparent.gif
../../../img2/transparent.gif
../../../img2/space.gif
../../../img2/space.gif
http://www.dlib.org/dlib/november14/giannakopoulos/giann-formula1.png
http://www.dlib.org/dlib/november14/giannakopoulos/giann-fig1-sm.png
http://www.dlib.org/dlib/november14/giannakopoulos/giann-fig2.png
http://www.dlib.org/dlib/november14/giannakopoulos/giann-fig3.png
http://www.dlib.org/dlib/november14/giannakopoulos/giann-fig4.png
http://www.dlib.org/dlib/november14/giannakopoulos/giannakopoulos.jpg
http://www.dlib.org/dlib/november14/giannakopoulos/foufoulas.jpg
http://www.dlib.org/dlib/november14/giannakopoulos/stamatogiannakis.png
http://www.dlib.org/dlib/november14/giannakopoulos/dimitropoulos.jpg
http://www.dlib.org/dlib/november14/giannakopoulos/manola.jpg
http://www.dlib.org/dlib/november14/giannakopoulos/ioannidis.png
../../../img2/transparent.gif

关于Python:用xpath提取后替换html img src,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32380902/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com