gpt4 book ai didi

python - BeautifulSoup 用 "N/A"填充缺失信息不起作用

转载 作者:太空宇宙 更新时间:2023-11-04 05:13:56 25 4
gpt4 key购买 nike

我正在以下网站上练习我的网页抓取技巧:“http://web.californiacraftbeer.com/Brewery-Member

我目前的代码如下。似乎我得到了正确的公司计数,但我在 CSV 文件中得到了重复的行,我认为只要公司缺少信息就会发生这种情况。在我的代码的多个部分中,我试图用文本“N/A”检测并替换丢失的信息,但它不起作用。我猜这个问题可能与 Zip() 函数有关,但我不确定如何解决它。

非常感谢任何帮助!

"""
Grabs brewery name, contact person, phone number, website address, and email address
for each brewery listed on the website.
"""

import requests, csv
from bs4 import BeautifulSoup

url = "http://web.californiacraftbeer.com/Brewery-Member"
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
each_company = soup.find_all("div", {"class": "ListingResults_All_CONTAINER ListingResults_Level3_CONTAINER"})
error_msg = "N/A"

def scraper():
"""Grabs information and writes to CSV"""
print("Running...")
results = []
count = 0

for info in each_company:
try:
company_name = info.find_all("span", itemprop="name")
except Exception as e:
company_name = "N/A"
try:
contact_name = info.find_all("div", {"class": "ListingResults_Level3_MAINCONTACT"})
except Exception as e:
contact_name = "N/A"
try:
phone_number = info.find_all("div", {"class": "ListingResults_Level3_PHONE1"})
except Exception as e:
phone_number = "N/A"
try:
website = info.find_all("span", {"class": "ListingResults_Level3_VISITSITE"})
except Exception as e:
website = "N/A"

for company, contact, phone, site in zip(company_name, contact_name, phone_number, website):
count += 1
print("Grabbing {0} ({1})...".format(company.text, count))
newrow = []
try:
newrow.append(company.text)
except Exception as e:
newrow.append(error_msg)
try:
newrow.append(contact.text)
except Exception as e:
newrow.append(error_msg)
try:
newrow.append(phone.text)
except Exception as e:
newrow.append(error_msg)
try:
newrow.append(site.find('a')['href'])
except Exception as e:
newrow.append(error_msg)
try:
newrow.append("info@" + company.text.replace(" ", "").lower() + ".com")
except Exception as e:
newrow.append(error_msg)
results.append(newrow)

print("Done")
outFile = open("brewery.csv", "w")
out = csv.writer(outFile, delimiter=',',quoting=csv.QUOTE_ALL, lineterminator='\n')
out.writerows(results)
outFile.close()

def main():
"""Runs web scraper"""
scraper()

if __name__ == '__main__':
main()

最佳答案

来自bs4 docs

"If find_all() can’t find anything, it returns an empty list. If find() can’t find anything, it returns None"

因此,例如,当 company_name = info.find_all("span", itemprop="name") 运行但不匹配任何内容时,它不会抛出异常并且 “NA” 永远不会被设置。

在这种情况下,您需要检查 company_name 是否为空列表:

if not company_name:
company_name = "N/A"

关于python - BeautifulSoup 用 "N/A"填充缺失信息不起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42267706/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com