gpt4 book ai didi

python - 根据条件将类名打印到字典 - BS4

转载 作者:行者123 更新时间:2023-12-01 06:42:01 25 4
gpt4 key购买 nike

我正在尝试从网站上抓取数据,将其存储在字典中,并将结果以结构化格式打印到 csv 表中。到目前为止,我的代码看起来像这样,并且几乎按照我想要的方式工作:

import requests
from bs4 import BeautifulSoup
import csv

URL = "https://database.globalreporting.org/reports/49283/"
r = requests.get(URL, verify=False)

soup = BeautifulSoup(r.content, 'html5lib')
# print(soup.prettify())
table = soup.findAll('li', attrs={'class': 'list-group-item'})
print(table)

quotes = []

for row in table:
quote = {}
quote['Label'] = " ".join(row.getText().split())
quotes.append(quote)
for line in row.select('span[class]'):
if line['class'][0] == 'glyphicon glyphicon-ok text-success':
quote['Tickmark'] = "Yes"
quotes.append(quote)
if line['class'][0] == 'glyphicon glyphicon-remove text-light':
quote['Cross'] = "No"
quotes.append(quote)

for quote in quotes:
print(quote)

filename = 'CSR_Info.csv'
with open(filename, 'w') as f:
w = csv.DictWriter(f, ['Label','Tickmark','Cross'])
w.writeheader()
for quote in quotes:
w.writerow(quote)

问题是,我的两个 if 语句总是没有得到任何值...

输出看起来像这样(逗号没有任何值,即使我期望是/否):

Integrated:,,

我抓取的 HTML 部分如下所示:

enter image description here

所以我需要的不是类的文本,而是类名本身来检查我的 if 语句。

有人知道如何实现这一目标吗?

最后我的结果应该是这样的:

Integrated:,Yes, 

或者如果没有:

Integrated:,,No 

最佳答案

如果打印 line['class'],您将看到 class_names 以列表形式返回,因此 class_names[0] 应该是 glyphicon NOT glyphicon glyphicon-remove text-light 这就是您没有获得值(value)的原因。

为了解决这个问题,我添加了 if 条件来检查列表的长度(如果为 3),然后使用 和 条件验证以下类名。

import requests
from bs4 import BeautifulSoup
import csv

URL = "https://database.globalreporting.org/reports/49283/"
r = requests.get(URL, verify=False)

soup = BeautifulSoup(r.content, 'html5lib')
# print(soup.prettify())
table = soup.findAll('li', attrs={'class': 'list-group-item'})
#print(table)

quotes = []

for row in table:
quote = {}
quote['Label'] = " ".join(row.getText().split())
quotes.append(quote)
for line in row.select('span[class]'):

if len(line['class'])==3:
if line['class'][0] == 'glyphicon' and line['class'][1] =='glyphicon-ok' and line['class'][2] =='text-success':
quote['Tickmark'] = "Yes"
quotes.append(quote)
if line['class'][0] == 'glyphicon' and line['class'][1] =='glyphicon-remove' and line['class'][2] =='text-light':
quote['Cross'] = "No"
quotes.append(quote)

for quote in quotes:
print(quote)

filename = 'CSR_Info.csv'
with open(filename, 'w') as f:
w = csv.DictWriter(f, ['Label','Tickmark','Cross'])
w.writeheader()
for quote in quotes:
w.writerow(quote)

输出:

{'Label': 'Publication year: 2017'}
{'Label': 'Report type: GRI - G4'}
{'Label': 'Adherence Level: In accordance - Comprehensive'}
{'Label': 'Sector supplement: Not Applicable'}
{'Label': 'Integrated:', 'Cross': 'No'}
{'Label': 'Integrated:', 'Cross': 'No'}
{'Label': 'GRI Service: Materiality Disclosures Service'}
{'Label': 'Reporting period: ? - ?'}
{'Label': 'Reporting cycle: ?'}
{'Label': 'Language: ?'}
{'Label': 'Number of pages: ?'}
{'Label': 'SDGs:', 'Tickmark': 'Yes'}
{'Label': 'SDGs:', 'Tickmark': 'Yes'}
{'Label': 'CDP:', 'Cross': 'No'}
{'Label': 'CDP:', 'Cross': 'No'}
{'Label': 'IFC:', 'Cross': 'No'}
{'Label': 'IFC:', 'Cross': 'No'}
{'Label': 'OECD Guidelines:', 'Tickmark': 'Yes'}
{'Label': 'OECD Guidelines:', 'Tickmark': 'Yes'}
{'Label': 'UNGC:', 'Tickmark': 'Yes'}
{'Label': 'UNGC:', 'Tickmark': 'Yes'}
{'Label': 'ISO 26000:', 'Cross': 'No'}
{'Label': 'ISO 26000:', 'Cross': 'No'}
{'Label': 'AA1000:', 'Cross': 'No'}
{'Label': 'AA1000:', 'Cross': 'No'}
{'Label': 'Stakeholder Panel/Expert Opinion:', 'Cross': 'No'}
{'Label': 'Stakeholder Panel/Expert Opinion:', 'Cross': 'No'}
{'Label': 'External assurance:', 'Tickmark': 'Yes'}
{'Label': 'External assurance:', 'Tickmark': 'Yes'}
{'Label': 'Type of Assurance Provider: Accountant'}
{'Label': 'Assurance Provider: Pricewaterhouse Coopers'}
{'Label': 'Assurance Scope: Entire sustainability report'}
{'Label': 'Level of Assurance: Limited/moderate'}
{'Label': 'Assurance Standard AA1000AS:', 'Cross': 'No'}
{'Label': 'Assurance Standard AA1000AS:', 'Cross': 'No'}
{'Label': 'Assurance Standard ISAE3000:', 'Tickmark': 'Yes'}
{'Label': 'Assurance Standard ISAE3000:', 'Tickmark': 'Yes'}
{'Label': 'Assurance Standard: national (general):', 'Cross': 'No'}
{'Label': 'Assurance Standard: national (general):', 'Cross': 'No'}
{'Label': 'Assurance Standard: national (sustainability):', 'Cross': 'No'}
{'Label': 'Assurance Standard: national (sustainability):', 'Cross': 'No'}

关于python - 根据条件将类名打印到字典 - BS4,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59410389/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com