gpt4 book ai didi

python - BeautifulSoup 获取其中的链接和信息

转载 作者:行者123 更新时间:2023-12-01 09:18:27 25 4
gpt4 key购买 nike

我想抓取一个网站。网站每页有10条投诉预览。我编写此脚本是为了获取 10 个投诉的链接以及每个链接内的一些信息。当我运行脚本时,我收到此错误消息“RecursionError:超出最大递归深度”。有人可以告诉我有什么问题吗?先感谢您!!

from requests import get
from bs4 import BeautifulSoup
import pandas as pd

# Create list objects for each information section
C_date = []
C_title = []
C_text = []
U_name = []
U_id = []
C_count = []
R_name = []
R_date = []
R_text = []

# Get 10 links for preview of complaints
def getLinks(url):
response = get(url)
html_soup = BeautifulSoup(response.text, 'html.parser')
c_containers = html_soup.find_all('div', class_='media')
# Store wanted links in a list
allLinks = []

for link in c_containers:
find_tag = link.find('a')
find_links = find_tag.get('href')
full_link = "".join((url, find_links))
allLinks.append(full_link)
# Get total number of links
print(len(allLinks))
return allLinks

def GetData(Each_Link):
each_complaint_page = get(Each_Link)
html_soup = BeautifulSoup(each_complaint_page.text, 'html.parser')
# Get date of complaint
dt = html_soup.main.find('span')
date = dt['title']
C_date.append(date)
# Get Title of complaint
TL = html_soup.main.find('h1', {'class': 'title'})
Title = TL.text
C_title.append(Title)
# Get main text of complaint
Tx = html_soup.main.find('div', {'class': 'description'})
Text = Tx.text
C_text.append(Text)
# Get user name and id
Uname = html_soup.main.find('span', {'class': 'user'})
User_name = Uname.span.text
User_id = Uname.attrs['data-memberid']
U_name.append(User_name)
U_id.append(User_id)
# Get view count of complaint
Vcount = html_soup.main.find('span', {'view-count-detail'})
View_count = Vcount.text
C_count.append(View_count)
# Get reply for complaint
Rpnm = html_soup.main.find('h4', {'name'})
Reply_name = Rpnm.next
R_name.append(Reply_name)
# Get reply date
Rpdt = html_soup.main.find('span', {'date-tips'})
Reply_date = Rpdt.attrs['title']
R_date.append(Reply_date)
# Get reply text
Rptx = html_soup.main.find('p', {'comment-content-msg company-comment-msg'})
Reply_text = Rptx.text
R_text.append(Reply_text)


link_list = getLinks('https://www.sikayetvar.com/arcelik')

for i in link_list:
z = GetData(i)
print(z)

PS:我的下一步是将所有信息放入数据框中

最佳答案

您的 GetData() 方法在没有基本情况的情况下调用自身:这会导致无限递归:

def GetData(data):
for i in GetData(data):

您还调用了 response = get(i) 但随后忽略了结果...也许您想说的是

def GetData(link):
i = get(link)
...

关于python - BeautifulSoup 获取其中的链接和信息,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51017998/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com