gpt4 book ai didi

python - xpath 仅适用于第一张图像

转载 作者:太空宇宙 更新时间:2023-11-03 18:32:56 25 4
gpt4 key购买 nike

我正在抓取这个网站 http://www.propertyfinder.ae/en/buy/villa-for-sale-dubai-jumeirah-park-1849328.html?img/0

我想获取此标签中的所有图像src div[@id='propertyPhoto']

我尝试了这个xpath

.//div[@id='propertyPhoto']//img/@src

我做了一个循环来提取 src,但我只得到了第一个图像 src

请帮忙

最佳答案

div#propertyPhoto 中只有主图像。其他的在 li#propertyPhotoMini0li#propertyPhotoMini1、...

所以 XPath 应该稍微修改以匹配两者。它们的id属性均以propertyPhoto开头;您可以使用以下 XPath:

.//*[starts-with(@id, 'propertyPhoto')]//img/@src

示例:

import urllib
from scrapy.selector import Selector
url = 'http://www.propertyfinder.ae/en/buy/villa-for-sale-dubai-jumeirah-park-1849328.html?img/0'
h = urllib.urlopen(url).read()
root = Selector(text=h, type='html')
for url in root.xpath(".//*[starts-with(@id, 'propertyPhoto')]//img/@src").extract():
print(url)

输出:

http://c1369023.r23.cf3.rackcdn.com/1849328-1-wide.jpg
http://c1369023.r23.cf3.rackcdn.com/1849328-1-mini.jpg
http://c1369023.r23.cf3.rackcdn.com/1849328-2-mini.jpg
http://c1369023.r23.cf3.rackcdn.com/1849328-3-mini.jpg
http://c1369023.r23.cf3.rackcdn.com/1849328-4-mini.jpg
http://c1369023.r23.cf3.rackcdn.com/1849328-5-mini.jpg
http://c1369023.r23.cf3.rackcdn.com/1849328-6-mini.jpg
http://c1369023.r23.cf3.rackcdn.com/1849328-7-mini.jpg
http://c1369023.r23.cf3.rackcdn.com/1849328-8-mini.jpg
http://c1369023.r23.cf3.rackcdn.com/1849328-9-mini.jpg
http://c1369023.r23.cf3.rackcdn.com/1849328-10-mini.jpg

关于python - xpath 仅适用于第一张图像,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22111723/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com