gpt4 book ai didi

python - 使用 BeautifulSoup 抓取 Google 搜索

转载 作者:行者123 更新时间:2023-12-01 01:26:08 24 4
gpt4 key购买 nike

我想抓取 Google 搜索的多个页面。到目前为止,我只能抓取第一页,但如何才能抓取多个页面。

from bs4 import BeautifulSoup
import requests
import urllib.request
import re
from collections import Counter

def search(query):
url = "http://www.google.com/search?q="+query

text = []
final_text = []

source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text,"html.parser")

for desc in soup.find_all("span",{"class":"st"}):
text.append(desc.text)

for title in soup.find_all("h3",attrs={"class":"r"}):
text.append(title.text)

for string in text:
string = re.sub("[^A-Za-z ]","",string)
final_text.append(string)

count_text = ' '.join(final_text)
res = Counter(count_text.split())

keyword_Count = dict(sorted(res.items(), key=lambda x: (-x[1], x[0])))

for x,y in keyword_Count.items():
print(x ," : ",y)


search("girl")

最佳答案

url = "http://www.google.com/search?q=" + query + "&start=" + str((page - 1) * 10)

关于python - 使用 BeautifulSoup 抓取 Google 搜索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53324849/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com