gpt4 book ai didi

python - BeautifulSoup - 抓取多个页面

转载 作者:行者123 更新时间:2023-11-28 16:56:25 25 4
gpt4 key购买 nike

我想从每一页上抓取成员的名字,然后转到下一页,做同样的事情。我的代码只适用于一页。我对此很陌生,如有任何建议,我们将不胜感激。谢谢。

    import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.bodia.com/spa-members/page/1")
soup = BeautifulSoup(r.text,"html.parser")
lights = soup.findAll("span",{"class":"light"})

lights_list = []
for l in lights[0:]:
result = l.text.strip()
lights_list.append(result)

print (lights_list)

我试过了,它只给了我第 3 页的成员。

    for i in range (1,4): #to scrape names of page 1 to 3
r = requests.get("https://www.bodia.com/spa-members/page/"+ format(i))
soup = BeautifulSoup(r.text,"html.parser")
lights = soup.findAll("span",{"class":"light"})

lights_list = []
for l in lights[0:]:
result = l.text.strip()
lights_list.append(result)

print (lights_list)

然后我试了一下:

i = 1
while i<5:
r = requests.get("https://www.bodia.com/spa-members/page/"+str(i))
i+=1

soup = BeautifulSoup(r.text,"html.parser")
lights = soup.findAll("span",{"class":"light"})

lights_list = []
for l in lights[0:]:
result = l.text.strip()
lights_list.append(result)

print (lights_list)

它给了我4个成员的名字,但我不知道是从哪一页来的

['Seng Putheary (Nana)']
['Marco Julia']
['Simon']
['Ms Anne Guerineau']

最佳答案

只需要进行两处更改即可让它抓取所有内容。

  1. r = requests.get("https://www.bodia.com/spa-members/page/"+ format(i))需要改成r = requests.get("https://www.bodia.com/spa-members/page/{}".format(i))。您对格式的使用不正确。

  2. 您没有遍历所有代码,因此结果是它只打印出一组名称,然后无法返回到循环的开头。缩进 for 循环下的所有内容修复了这个问题。

import requests
from bs4 import BeautifulSoup

for i in range (1,4): #to scrape names of page 1 to 3
r = requests.get("https://www.bodia.com/spa-members/page/{}".format(i))
soup = BeautifulSoup(r.text,"html.parser")
lights = soup.findAll("span",{"class":"light"})
lights_list = []
for l in lights[0:]:
result = l.text.strip()
lights_list.append(result)

print(lights_list)

上面的代码每 3 秒为它抓取的页面吐出一个名称列表。

关于python - BeautifulSoup - 抓取多个页面,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57932635/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com