gpt4 book ai didi

python - 使用 xpath 获取 src 属性

转载 作者:行者123 更新时间:2023-12-01 04:45:33 28 4
gpt4 key购买 nike

我使用带有“requests”和“lxml”模块的Python来创建一个已解析的HTML对象。我的任务是在以下页面中查找包含字符串“googleadservices”的所有链接:

http://www.euronews.com/2015/03/20/uber-taxis-overtake-new-york-yellow-cabs/

我的 xpath 查询是

//script[contains(@src,'google')]/@src

我认为它应该返回每个脚本节点的 src 属性的值,但它失败了,因为我看到以下查询结果:

/js/google.js
https://apis.google.com/js/plusone.js
http://pagead2.googlesyndication.com/pagead/show_ads.js

请注意:

http://partner.googleadservices.com/gpt/pubads_impl_58.js

失踪了!!!

我想我错过了一个微妙的语法点,我会很高兴得到启发。

最佳答案

requests 发出的请求的响应中没有包含 src="http://partner.googleadservices.com/gpt/pubads_impl_58.js" 的脚本。它是异步加载的。

作为解决方法,您可以在selenium package的帮助下自动化真正的浏览器。 .

示例(使用 PhantomJS headless 浏览器):

>>> from selenium import webdriver
>>>
>>> driver = webdriver.PhantomJS()
>>> url = "http://www.euronews.com/2015/03/20/uber-taxis-overtake-new-york-yellow-cabs/"
>>> driver.get(url)
>>> for script in driver.find_elements_by_xpath("//script[contains(@src, 'google')]"):
... print(script.get_attribute('src'))
...
https://apis.google.com/_/scs/apps-static/_/js/k=oz.gapi.en_US.t-LxkuL3EUg.O/m=gapi_iframes_style_bubble/exm=auth,plusone,ytsubscribe/rt=j/sv=1/d=1/ed=1/am=IQ/rs=AGLTcCNAFql0FUItRCrv44X1do5tNb0b8Q/t=zcms/cb=gapi.loaded_3
https://apis.google.com/_/scs/apps-static/_/js/k=oz.gapi.en_US.t-LxkuL3EUg.O/m=auth/exm=plusone,ytsubscribe/rt=j/sv=1/d=1/ed=1/am=IQ/rs=AGLTcCNAFql0FUItRCrv44X1do5tNb0b8Q/t=zcms/cb=gapi.loaded_2
https://apis.google.com/_/scs/apps-static/_/js/k=oz.gapi.en_US.t-LxkuL3EUg.O/m=ytsubscribe/exm=plusone/rt=j/sv=1/d=1/ed=1/am=IQ/rs=AGLTcCNAFql0FUItRCrv44X1do5tNb0b8Q/t=zcms/cb=gapi.loaded_1
https://apis.google.com/_/scs/apps-static/_/js/k=oz.gapi.en_US.t-LxkuL3EUg.O/m=plusone/rt=j/sv=1/d=1/ed=1/am=IQ/rs=AGLTcCNAFql0FUItRCrv44X1do5tNb0b8Q/t=zcms/cb=gapi.loaded_0
http://www.googletagservices.com/tag/js/gpt.js
http://www.euronews.com/js/google.js
https://apis.google.com/js/plusone.js
http://partner.googleadservices.com/gpt/pubads_impl_58.js
http://pagead2.googlesyndication.com/pagead/osd.js
http://pagead2.googlesyndication.com/pagead/show_ads.js
http://pagead2.googlesyndication.com/pagead/js/r20150331/r20150224/show_ads_impl.js
http://www.googletagservices.com/tag/js/check_359604.js
http://googleads.g.doubleclick.net/pagead/ads?client=ca-pub-3977141546397241&output=js&adk=2828788313&image_size=607x90&lmt=1428369754&num_ads=4&skip=0&ad_type=text&ea=0&oe=utf8&flash=0&hl=en&url=http%3A%2F%2Fwww.euronews.com%2F2015%2F03%2F20%2Fuber-taxis-overtake-new-york-yellow-cabs%2F&dt=1428355354776&shv=r20150331&cbv=r20150224&saldr=sb&correlator=6304440702977&frm=20&ga_vid=21319259.1428355355&ga_sid=1428355355&ga_hid=935959392&ga_fc=0&u_tz=-240&u_his=1&u_java=0&u_h=900&u_w=1440&u_ah=873&u_aw=1440&u_cd=32&u_nplug=0&u_nmime=0&dff=arial&dfs=12&biw=400&bih=300&eid=317150304&oid=3&rx=0&eae=2&fc=24&brdim=0%2C0%2C0%2C0%2C1440%2C23%2C0%2C0%2C400%2C300&vis=0&rsz=0%7C0%7C%7C&abl=CS&ppjl=u&fu=1024&bc=1&ifi=1&dtd=155
>>>

关于python - 使用 xpath 获取 src 属性,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29479616/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com