gpt4 book ai didi

Python 谷歌搜索

转载 作者:太空宇宙 更新时间:2023-11-03 15:59:01 25 4
gpt4 key购买 nike

我尝试用 python 在 google 中搜索一个词。然后我尝试将其提取到列表中并打印列表。但是现在我遇到了这个问题:

class search:
def __init__(self, search):
page = requests.get("http://www.google.de/search?q="+search)
soup = BeautifulSoup(page.content)
links = soup.findAll("a")
for link in soup.find_all("a",href=re.compile("(?<=/url\?q=)(htt.*://.*)")):
print re.split(":(?=http)",link["href"].replace("/url?q=",""))

search("lol")

这行得通。但是看看输出:

['http://euw.leagueoflegends.com/de&sa=U&ved=0ahUKEwie3sWOkbHRAhVGGCwKHSChAWQQFggVMAA&usg=AFQjCNEkd1xB6jaSnzWz-VpYcnHvSNYMJA']

['http://webcache.googleusercontent.com/search%3Fq%3Dcache:as12jwqcnbAJ', 'http://euw.leagueoflegends.com/de%252Blol%26hl%3Dde%26ct%3Dclnk&sa=U&ved=0ahUKEwie3sWOkbHRAhVGGCwKHSCqewsfdvfgh1A&usg=AFQjCNEm132qewdasDq2hCb9SRjnbmbMb3rkw']

(等等)

如何将其放入列表中!?我怎样才能删除这个网络缓存?

我知道它是 utf8 编码的,但我可以简单地用 urllib2 解码它。

提前致谢!

最佳答案

这会让你们更亲近。未使用 links。该方法现在返回一个不包含字符串的列表 webcache :

from bs4 import BeautifulSoup
import requests
import re

class Google:
@classmethod
def search(self, search):
page = requests.get("http://www.google.de/search?q="+search)
soup = BeautifulSoup(page.content)
links = soup.find_all("a",href=re.compile("(?<=/url\?q=)(htt.*://.*)"))
urls = [re.split(":(?=http)",link["href"].replace("/url?q=",""))[0] for link in links]
return [url for url in urls if 'webcache' not in url]

print Google.search("lol")

输出

[u'http://euw.leagueoflegends.com/de&sa=U&ved=0ahUKEwixjpPMmrHRAhUHlSwKHUIuCIIQFggVMAA&usg=AFQjCNEkd1xB6jaSnzWz-VpYcnHvSNYMJA', u'http://euw.leagueoflegends.com/de/news/&sa=U&ved=0ahUKEwixjpPMmrHRAhUHlSwKHUIuCIIQjBAIHDAB&usg=AFQjCNGY7DvS4oNNQktCTf3FGtStOG9xvA', u'http://gameinfo.euw.leagueoflegends.com/de/game-info/&sa=U&ved=0ahUKEwixjpPMmrHRAhUHlSwKHUIuCIIQjBAIHjAD&usg=AFQjCNGrvfhy3JIOHWUYB-YtyFV2A...

关于Python 谷歌搜索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41527601/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com