gpt4 book ai didi

python - Scrapy提取错误的IMG SRC

转载 作者:行者123 更新时间:2023-11-30 22:49:22 24 4
gpt4 key购买 nike

我正在尝试使用 Scrapy 获取 URLs of images on a page ID HERO_PHOTO 。目标元素具有以下 HTML 代码

<img alt="Photo of Gray Line" style="position: relative; left: -50px; top: 0px;" id="HERO_PHOTO" class="flexibleImage" src="https://media-cdn.tripadvisor.com/media/photo-s/04/71/70/7c/gray-line-tours-montreal.jpg" width="352" height="260">

在 Chrome 浏览器中运行

$('#HERO_PHOTO').attr('src')

正确抓取URL

"https://media-cdn.tripadvisor.com/media/photo-s/04/71/70/7c/gray-line-tours-montreal.jpg"
<小时/>

问题:但是在 Scrapy 中使用以下 CSS 选择器,

response.css('#HERO_PHOTO::attr(src)').extract_first()

response.css('#HERO_PHOTO').xpath('@src').extract_first()

response.css('#HERO_PHOTO[src]').extract_first()

给我们

https://static.tacdn.com/img2/x.gif

使用.extract()也返回了相同的错误 URL。

为什么 Scrapy 会获取不同的 SRC 值?

最佳答案

图片链接在页面中,但不是直接像 <img>标签。确实有一些JavaScript代码进行了处理。HTML 中有一个 JavaScript 片段,其中包含您想要的图像链接(稍微重新格式化):

...
}(window,ta));
</script>
<script type="text/javascript">
var lazyImgs = [{
"data": "//maps.google.com/maps/api/staticmap?&channel=ta.desktop&zoom=15&size=340x225&client=gme-tripadvisorinc&sensor=falselanguageParam&center=45.503395,-73.573174&maptype=roadmap&&markers=icon:http%3A%2F%2Fc1.tacdn.com%2Fimg2%2Fmaps%2Ficons%2Fpin_v2_CurrentCenter.png|45.503395,-73.57317&signature=FqI7Z1egbpsVrlEE0yjw9HmsMJ8=",
"scroll": false,
"tagType": "img",
"id": "lazyload_1098682971_0",
"priority": 500,
"logerror": false
}, {
"data": "//ad.atdmt.com/i/img;p=11007200799198;cache=?ord=1475487471489",
"scroll": false,
"tagType": "img",
"id": "lazyload_1098682971_1",
"priority": 1000,
"logerror": false
}, {
"data": "//ad.doubleclick.net/ad/N4764.TripAdvisor/B7050081;sz=1x1?ord=1475487471489",
"scroll": false,
"tagType": "img",
"id": "lazyload_1098682971_2",
"priority": 1000,
"logerror": false
}, {
"data": "https://static.tacdn.com/img2/maps/icons/spinner24.gif",
"scroll": false,
"tagType": "img",
"id": "lazyload_1098682971_3",
"priority": 100,
"logerror": false
}, {
"data": "https://media-cdn.tripadvisor.com/media/photo-s/04/71/70/7c/gray-line-tours-montreal.jpg",
"scroll": false,
"tagType": "img",
"id": "HERO_PHOTO",
"priority": 100,
"logerror": false
}, {
"data": "https://media-cdn.tripadvisor.com/media/photo-s/0c/f5/19/98/montreal-night-tour.jpg",
"scroll": false,
"tagType": "img",
"id": "THUMB_PHOTO1",
"priority": 100,
"logerror": false
}, {
"data": "https://media-cdn.tripadvisor.com/media/photo-s/0c/f5/19/8f/montreal-night-tour.jpg",
"scroll": false,
"tagType": "img",
"id": "THUMB_PHOTO2",
"priority": 100,
"logerror": false
}, {
"data": "https://static.tacdn.com/img2/generic/site/no_user_photo-v1.gif",
"scroll": false,
"tagType": "img",
"id": "lazyload_1098682971_4",
"priority": 100,
"logerror": false
}...

解析此问题的一种方法是使用 js2xml :

from pprint import pprint
# get all `<script>`s content
for js in response.xpath('.//script[@type="text/javascript"]/text()').extract():
try:
jstree = js2xml.parse(js)

# look for assignment of `var lazyImgs`
for imgs in jstree.xpath('//var[@name="lazyImgs"]/*'):

# use js2xml.make_dict() -- poor name I know
# to build a useful Python object
data = js2xml.make_dict(imgs)

pprint(data)

break

except Exception as e:
pass

这就是你得到的结果:

[{'data': '//maps.google.com/maps/api/staticmap?&channel=ta.desktop&zoom=15&size=340x225&client=gme-tripadvisorinc&sensor=falselanguageParam&center=45.503395,-73.573174&maptype=roadmap&&markers=icon:http%3A%2F%2Fc1.tacdn.com%2Fimg2%2Fmaps%2Ficons%2Fpin_v2_CurrentCenter.png|45.503395,-73.57317&signature=FqI7Z1egbpsVrlEE0yjw9HmsMJ8=',
'id': 'lazyload_-1977833463_0',
'logerror': False,
'priority': 500,
'scroll': False,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/maps/icons/spinner24.gif',
'id': 'lazyload_-1977833463_1',
'logerror': False,
'priority': 100,
'scroll': False,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-s/04/71/70/7c/gray-line-tours-montreal.jpg',
'id': 'HERO_PHOTO',
'logerror': False,
'priority': 100,
'scroll': False,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-s/0c/f5/19/98/montreal-night-tour.jpg',
'id': 'THUMB_PHOTO1',
'logerror': False,
'priority': 100,
'scroll': False,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-s/0c/f5/19/8f/montreal-night-tour.jpg',
'id': 'THUMB_PHOTO2',
'logerror': False,
'priority': 100,
'scroll': False,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/generic/site/no_user_photo-v1.gif',
'id': 'lazyload_-1977833463_2',
'logerror': False,
'priority': 100,
'scroll': False,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-l/08/38/19/cb/gayle-h.jpg',
'id': 'lazyload_-1977833463_3',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/lvl_01.png',
'id': 'lazyload_-1977833463_4',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/rev_02.png',
'id': 'lazyload_-1977833463_5',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/Appreciated.png',
'id': 'lazyload_-1977833463_6',
'logerror': False,
'priority': 100,
'scroll': False,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/icons/gray_flag.png',
'id': 'lazyload_-1977833463_7',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-l/01/b1/32/93/holidays1958.jpg',
'id': 'lazyload_-1977833463_8',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/lvl_04.png',
'id': 'lazyload_-1977833463_9',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/rev_04.png',
'id': 'lazyload_-1977833463_10',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/FunLover.png',
'id': 'lazyload_-1977833463_11',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/Appreciated.png',
'id': 'lazyload_-1977833463_12',
'logerror': False,
'priority': 100,
'scroll': False,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/icons/gray_flag.png',
'id': 'lazyload_-1977833463_13',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-o/06/4d/bc/f6/disneybus.jpg',
'id': 'lazyload_-1977833463_14',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/lvl_06.png',
'id': 'lazyload_-1977833463_15',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/rev_06.png',
'id': 'lazyload_-1977833463_16',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/FunLover.png',
'id': 'lazyload_-1977833463_17',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/Appreciated.png',
'id': 'lazyload_-1977833463_18',
'logerror': False,
'priority': 100,
'scroll': False,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/icons/gray_flag.png',
'id': 'lazyload_-1977833463_19',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-l/01/2e/70/a7/avatar078.jpg',
'id': 'lazyload_-1977833463_20',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/rev_01.png',
'id': 'lazyload_-1977833463_21',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/Appreciated.png',
'id': 'lazyload_-1977833463_22',
'logerror': False,
'priority': 100,
'scroll': False,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/icons/gray_flag.png',
'id': 'lazyload_-1977833463_23',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-l/01/2e/70/9f/avatar070.jpg',
'id': 'lazyload_-1977833463_24',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/lvl_02.png',
'id': 'lazyload_-1977833463_25',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/rev_03.png',
'id': 'lazyload_-1977833463_26',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/Appreciated.png',
'id': 'lazyload_-1977833463_27',
'logerror': False,
'priority': 100,
'scroll': False,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/icons/gray_flag.png',
'id': 'lazyload_-1977833463_28',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-l/03/9f/a6/94/facebook-avatar.jpg',
'id': 'lazyload_-1977833463_29',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/lvl_04.png',
'id': 'lazyload_-1977833463_30',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/rev_05.png',
'id': 'lazyload_-1977833463_31',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/FunLover.png',
'id': 'lazyload_-1977833463_32',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/Appreciated.png',
'id': 'lazyload_-1977833463_33',
'logerror': False,
'priority': 100,
'scroll': False,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/icons/gray_flag.png',
'id': 'lazyload_-1977833463_34',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-l/06/f3/32/86/complsv.jpg',
'id': 'lazyload_-1977833463_35',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/lvl_04.png',
'id': 'lazyload_-1977833463_36',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/rev_05.png',
'id': 'lazyload_-1977833463_37',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/FunLover.png',
'id': 'lazyload_-1977833463_38',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/Appreciated.png',
'id': 'lazyload_-1977833463_39',
'logerror': False,
'priority': 100,
'scroll': False,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/icons/gray_flag.png',
'id': 'lazyload_-1977833463_40',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-l/05/f2/4d/68/christine-n.jpg',
'id': 'lazyload_-1977833463_41',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/lvl_03.png',
'id': 'lazyload_-1977833463_42',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/rev_04.png',
'id': 'lazyload_-1977833463_43',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/FunLover.png',
'id': 'lazyload_-1977833463_44',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/Appreciated.png',
'id': 'lazyload_-1977833463_45',
'logerror': False,
'priority': 100,
'scroll': False,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/icons/gray_flag.png',
'id': 'lazyload_-1977833463_46',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-l/01/2e/70/80/avatar001.jpg',
'id': 'lazyload_-1977833463_47',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/lvl_03.png',
'id': 'lazyload_-1977833463_48',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/rev_04.png',
'id': 'lazyload_-1977833463_49',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/FunLover.png',
'id': 'lazyload_-1977833463_50',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/Appreciated.png',
'id': 'lazyload_-1977833463_51',
'logerror': False,
'priority': 100,
'scroll': False,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/icons/gray_flag.png',
'id': 'lazyload_-1977833463_52',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-l/0a/45/46/e2/tracey-g.jpg',
'id': 'lazyload_-1977833463_53',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/lvl_06.png',
'id': 'lazyload_-1977833463_54',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/rev_06.png',
'id': 'lazyload_-1977833463_55',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/FunLover.png',
'id': 'lazyload_-1977833463_56',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/badges/20px/Appreciated.png',
'id': 'lazyload_-1977833463_57',
'logerror': False,
'priority': 100,
'scroll': False,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/icons/gray_flag.png',
'id': 'lazyload_-1977833463_58',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-f/02/6d/40/b2/montreal-amphi-bus-tour.jpg',
'id': 'lazyload_-1977833463_59',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-l/01/39/2d/43/old-montreal-walking.jpg',
'id': 'lazyload_-1977833463_60',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-l/06/df/96/c7/excursions-montreal-private.jpg',
'id': 'lazyload_-1977833463_61',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-l/02/ad/57/0a/filename-p1010076-jpg.jpg',
'id': 'lazyload_-1977833463_62',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-o/04/b5/6a/8d/ali-l.jpg',
'id': 'lazyload_-1977833463_63',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-l/01/2e/70/87/avatar008.jpg',
'id': 'lazyload_-1977833463_64',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-o/06/8a/c5/7d/leonard-d.jpg',
'id': 'lazyload_-1977833463_65',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-o/05/6d/32/ca/rpm13111.jpg',
'id': 'lazyload_-1977833463_66',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-l/01/2e/70/87/avatar008.jpg',
'id': 'lazyload_-1977833463_67',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/neighborhood/icon_hood_white.png',
'id': 'lazyload_-1977833463_68',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/oyster/500/08/5b/34/b0/sherbrooke-street-west-shopping--.jpg',
'id': 'lazyload_-1977833463_69',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/maps/icons/icon_mapControl_expand_idle_30x30.png',
'id': 'lazyload_-1977833463_70',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/maps/icons/icon_mapControl_expand_hover_30x30.png',
'id': 'lazyload_-1977833463_71',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-l/01/a1/f2/6b/marche-atwater.jpg',
'id': 'lazyload_-1977833463_72',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-l/01/41/78/a3/mcgill-university-lower.jpg',
'id': 'lazyload_-1977833463_73',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-l/04/06/16/08/musee-grevin.jpg',
'id': 'lazyload_-1977833463_74',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-l/03/4a/9a/85/laurie-raphael.jpg',
'id': 'lazyload_-1977833463_75',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-l/09/45/53/16/cafe-humble-lion.jpg',
'id': 'lazyload_-1977833463_76',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://media-cdn.tripadvisor.com/media/photo-l/03/2f/37/03/essence.jpg',
'id': 'lazyload_-1977833463_77',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/branding/logo_with_tagline.png',
'id': 'LOGOTAGLINE',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'},
{'data': 'https://static.tacdn.com/img2/icons/bell.png',
'id': 'lazyload_-1977833463_78',
'logerror': False,
'priority': 100,
'scroll': True,
'tagType': 'img'}]

关于python - Scrapy提取错误的IMG SRC,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39817067/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com