gpt4 book ai didi

python - 尝试循环网页进行数据抓取时出错

转载 作者:行者123 更新时间:2023-12-01 08:19:26 25 4
gpt4 key购买 nike

我已经编写了从第一页提取数据的代码,但在尝试从所有页面提取数据时遇到了问题。

这是我从页面“a”提取数据的代码

from bs4 import BeautifulSoup
import urllib
import urllib.request
import os
from string import ascii_lowercase


def make_soup(url):
thepage = urllib.request.urlopen(url)
soupdata = BeautifulSoup(thepage, 'html.parser')
return soupdata

playerdatasaved = ""

soup = make_soup('https://www.basketball-reference.com/players/a/')

for record in soup.findAll("tr"):
playerdata = ""
for data in record.findAll(["th","td"]):
playerdata = playerdata + "," + data.text

playerdatasaved = playerdatasaved + "\n" + playerdata[1:]

print(playerdatasaved)

header = "player, from, to, position, height, weight, dob, year,
colleges"+"\n"
file = open(os.path.expanduser("basketballstats.csv"),"wb")
file.write(bytes(header, encoding = "ascii", errors = "ignore"))
file.write(bytes(playerdatasaved[1:], encoding = "ascii", errors = "ignore"))

现在要循环页面,我的逻辑是这段代码

from bs4 import BeautifulSoup
import urllib
import urllib.request
import os
from string import ascii_lowercase

def make_soup(url):
thepage = urllib.request.urlopen(url)
soupdata = BeautifulSoup(thepage, 'html.parser')
return soupdata

playerdatasaved = ""
for letter in ascii_lowercase:
soup = make_soup("https://www.basketball-reference.com/players/" + letter + "/")
for record in soup.findAll("tr"):
playerdata = ""
for data in record.findAll(["th","td"]):
playerdata = playerdata + "," + data.text

playerdatasaved = playerdatasaved + "\n" + playerdata[1:]

header = "player, from, to, position, height, weight, dob, year,
colleges"+"\n"
file = open(os.path.expanduser("basketball.csv"),"wb")
file.write(bytes(header, encoding = "ascii", errors = "ignore"))
file.write(bytes(playerdatasaved[1:], encoding = "ascii", errors = "ignore"))

但是,这遇到了与该行相关的错误: soup = make_soup("https://www.basketball-reference.com/players/ "+ 字母 + "/")

最佳答案

我尝试运行您的代码,但遇到了 ssl 证书错误 CERTIFICATE_VERIFY_FAILED,这似乎是您尝试抓取的网站的问题,而不是您的代码的问题。

也许这个堆栈可以帮助清除问题: "SSL: certificate_verify_failed" error when scraping https://www.thenewboston.com/

关于python - 尝试循环网页进行数据抓取时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54761480/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com