gpt4 book ai didi

python - 如何处理空列表 - 多页网页抓取

转载 作者:行者123 更新时间:2023-12-01 01:18:54 24 4
gpt4 key购买 nike

我试图通过网络抓取从 Lazada 提取问题和答案部分,但是当某些页面没有任何问题/答案时我遇到了问题。当我为多个网页运行我的代码时,我的代码不会返回任何内容,但仅适用于有问题和答案的一个页面。

如何让代码继续阅读网页的其余部分,尽管第一页没有问题?

我尝试在代码中添加 if else 语句,如下所示。

 import bleach
import csv
import datetime
from bs4 import BeautifulSoup

urls = ['url1','url2','url3']

for url in urls:
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

now = datetime.datetime.now()
print ("Date data being pulled:")
print str(now)
print ("")

nameList = soup.findAll("div", {"class":"qna-content"})

for name in nameList:
if nameList == None:
print('None')
else:
print(name.get_text())
continue

我的预期输出如下所示:

None --> output from url1None --> output from url2
can choose huzelnut?Hi Dear Customer , for the latest expiry date its on 2019 , and we will make sure the expiry date is still more than 6 months.--> output from url3

感谢您的帮助,提前致谢!

最佳答案

您的语法错误,请将 if nameList == None: 放在循环之外,还需要修复缩进。

urls = ['url1','url2','url3']

for url in urls:
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

now = datetime.datetime.now()
print ("Date data being pulled:")
print str(now)
print ("")

nameList = soup.findAll("div", {"class":"qna-content"})
if nameList == None:
print(url, 'None')
continue # skip this URL

for name in nameList:
print(name.get_text())

关于python - 如何处理空列表 - 多页网页抓取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54031891/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com