gpt4 book ai didi

python - beautiful soup 从谷歌搜索中提取一个 href

转载 作者:行者123 更新时间:2023-12-05 05:28:27 27 4
gpt4 key购买 nike

谷歌搜索在 HTML 上给我以下第一个结果:

<h3 class="r"><a href="https://rads.stackoverflow.com/amzn/click/com/0470284889" rel="nofollow noreferrer" class="l vst" onmousedown="return rwt(this,'','','','1','AFQjCNEv1W9YC2jcSKYdEo2kNqBMJ-Utmg','k89K9hF4cVNpxQYHtEKiUQ','0CCoQFjAA',null,event)"><em>Quantitative Trading</em>: <em>How to Build Your Own Algorithmic</em> <b>...</b> - Amazon</a></h3>

我想提取链接 http://www.amazon.com/Quantitative-Trading-Build-Algorithmic-Business/dp/0470284889由此,但是当我使用 BeautifulSoup 来提取信息时,我得到了

soup.find("h3").find("a").get("href")

我改为获取以下字符串:

/url?q= http://www.amazon.com/Quantitative-Trading-Build-Algorithmic-Business/dp/0470284889&sa=U&ei=P2ycT6OoNuasiAL2ncV5&ved=0CBIQFjAA&usg=AFQjCNEo_ujANAKnjheWDRlBKnJ1BGeA7A

我知道链接在那里,我可以通过删除/url?q= 和 & 符号后的所有内容来解析它,但我想知道是否有更简洁的解决方案。

谢谢!

最佳答案

您可以使用 urlparse.urlparse 的组合和 urlparse.parse_qs , 例如

>>> import urlparse
>>> url = '/url?q=http://www.amazon.com/Quantitative-Trading-Build-Algorithmic-Business/dp/0470284889&sa=U&ei=P2ycT6OoNuasiAL2ncV5&ved=0CBIQFjAA&usg=AFQjCNEo_ujANAKnjheWDRlBKnJ1BGe'
>>> data = urlparse.parse_qs(
... urlparse.urlparse(url).query
... )
>>> data
{'ei': ['P2ycT6OoNuasiAL2ncV5'],
'q': ['http://www.amazon.com/Quantitative-Trading-Build-Algorithmic-Business/dp/0470284889'],
'sa': ['U'],
'usg': ['AFQjCNEo_ujANAKnjheWDRlBKnJ1BGe'],
'ved': ['0CBIQFjAA']}
>>> data['q'][0]
'http://www.amazon.com/Quantitative-Trading-Build-Algorithmic-Business/dp/0470284889'

关于python - beautiful soup 从谷歌搜索中提取一个 href,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10368153/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com