gpt4 book ai didi

python - 使用 BeautifulSoup 遍历 div 表

转载 作者:行者123 更新时间:2023-12-04 09:30:49 25 4
gpt4 key购买 nike

一个 divclass="tableBody"有很多div小时候。我想得到它的所有div child 并得到我在这张照片中突出显示的字符串。

import bs4 as bs
import urllib.request
source = urllib.request.urlopen("https://www.ungm.org/Public/Notice").read()
soup = bs.BeautifulSoup(source,'lxml')

t_body = soup.find("div", class_="tableBody")
t_divs = t_body.find_all("div")
上面的代码返回一个空列表。
enter image description here
我正在努力学习BS4。如果您能帮助我编写代码,我将不胜感激。

最佳答案

您在页面上看到的数据是通过 JavaScript 动态加载的。您可以使用 requests模块来模拟它。
例如:

import requests
from bs4 import BeautifulSoup


url = 'https://www.ungm.org/Public/Notice/Search'

payload = {
"PageIndex": 0,
"PageSize": 15,
"Title": "",
"Description": "",
"Reference": "",
"PublishedFrom": "",
"PublishedTo": "12-Jul-2020",
"DeadlineFrom": "12-Jul-2020",
"DeadlineTo": "",
"Countries": [],
"Agencies": [],
"UNSPSCs": [],
"NoticeTypes": [],
"SortField": "DatePublished",
"SortAscending": False,
"isPicker": False,
"NoticeTASStatus": [],
"IsSustainable": False,
"NoticeDisplayType": None,
"NoticeSearchTotalLabelId": "noticeSearchTotal",
"TypeOfCompetitions": []
}

soup = BeautifulSoup( requests.post(url, json=payload).content, 'html.parser' )

for row in soup.select('.tableRow'):
cells = [cell.get_text(strip=True) for cell in row.select('.tableCell')]
print(cells[1])
print('{:<30}{:<15}{:<15}{:<25}{:<45}{:<15}'.format(*cells[2:]))
print('-'*80)
打印:
Supply and delivery of 78 smartphones
13-Jul-2020 11:00 (GMT 2.00) 11-Jul-2020 FAO Request for quotation 2020/FRMLW/FRMLW/106096 Malawi
--------------------------------------------------------------------------------
Supply of LEGUMES SEEDS for rainfed season
23-Jul-2020 14:00 (GMT 2.00) 11-Jul-2020 FAO Invitation to bid 2020/FRMLW/FRMLW/106051 Malawi
--------------------------------------------------------------------------------
Supply of MAIZE SEEDS for rainfed season
22-Jul-2020 14:00 (GMT 2.00) 11-Jul-2020 FAO Invitation to bid 2020/FRMLW/FRMLW/106050 Malawi
--------------------------------------------------------------------------------
Procurement of Supply and Installation of Outdoor Metal Furniture for Rooftop Terrace at FAO Headquarters in Rome, Italy
10-Aug-2020 12:00 (GMT 2.00) 11-Jul-2020 FAO Invitation to bid 2020/CSAPC/CSDID/105286 Italy
--------------------------------------------------------------------------------
Procurement of Silo for Emergency Project
13-Jul-2020 13:00 (GMT 5.00) 11-Jul-2020 FAO Invitation to bid 2020/FABGD/FABGD/106145 Bangladesh
--------------------------------------------------------------------------------
Procurement of Concentrate Ruminant Feed
13-Jul-2020 13:00 (GMT 5.00) 11-Jul-2020 FAO Invitation to bid 2020/FABGD/FABGD/106064 Bangladesh
--------------------------------------------------------------------------------
Purchase of Waste Collection Vehicles - (Two Tractors)
22-Jul-2020 06:30 (GMT 0.00) 11-Jul-2020 UNOPS Request for quotation RFQ/2020/15298 Sri Lanka
--------------------------------------------------------------------------------
Procurement of Laboratory Equipment and Material
24-Jul-2020 22:23 (GMT -1.00) 11-Jul-2020 FAO Invitation to bid 2020/FRGAM/FRGAM/106143 Gambia
--------------------------------------------------------------------------------
Compra de chalecos para promotores comunitarios para la Oficina de Unicef Bolivar - LRFQ-2020-9159352
16-Jul-2020 23:59 (GMT -3.00) 11-Jul-2020 UNICEF Request for proposal LRFQ-2020-9159352 Venezuela
--------------------------------------------------------------------------------
Call for Proposals Quality Based Fixed Budget (CFPFB):
26-Jul-2020 17:00 (GMT 3.00) 11-Jul-2020 UNDP Request for proposal UNDP-SYR-RPA-051-20 Syrian Arab Republic
--------------------------------------------------------------------------------
Innovation and Design Specialist
27-Jul-2020 00:00 (GMT -5.00) 11-Jul-2020 UNDP Not set Innovation and Design Specialist Turkey
--------------------------------------------------------------------------------
(RFI) from national and/or international CSOs/NGOs for potential partnership with UNDP and its pooled funding mechanism, the Darfur Community Peace and Stability Fund (DCPSF),
26-Jul-2020 08:00 (GMT -7.00) 11-Jul-2020 UNDP Request for information RFI-SDN-20-002 Sudan
--------------------------------------------------------------------------------
IRAQ-LRPS-017-2020-9159660 Rehabilitation of 3 water projects at Avrek, Grey Basi and Sarsenk in Duhok
26-Jul-2020 12:00 (GMT 3.00) 11-Jul-2020 UNICEF Request for proposal 9159660 Iraq
--------------------------------------------------------------------------------
106142 INVITACIÓN A COTIZAR PARA LA ADQUISICIÓN DE FERTILIZANTES, HERRAMIENTAS Y MATERIALES PARA ECA DE CACAO
21-Jul-2020 22:00 (GMT -5.00) 10-Jul-2020 FAO Request for quotation 2020/FLCOL/FLCOL/106142 Colombia
--------------------------------------------------------------------------------
Achat de tablettes, de GPS et batteries rechargeable (206 tablettes, 68 GPS, et 181 pack chargeurs et batteries rechargeables) à livrer sur Dakar
28-Jul-2020 12:00 (GMT 0.00) 10-Jul-2020 FAO Invitation to bid 2020/FRSEN/FRSEN/106093 United Kingdom
--------------------------------------------------------------------------------

编辑:要获取所有页面,请仅过滤掉“阿富汗”国家并保存到 CSV,您可以使用以下示例:
import csv
import requests
from bs4 import BeautifulSoup


url = 'https://www.ungm.org/Public/Notice/Search'

payload = {
"PageIndex": 0,
"PageSize": 15,
"Title": "",
"Description": "",
"Reference": "",
"PublishedFrom": "",
"PublishedTo": "12-Jul-2020",
"DeadlineFrom": "12-Jul-2020",
"DeadlineTo": "",
"Countries": [],
"Agencies": [],
"UNSPSCs": [],
"NoticeTypes": [],
"SortField": "DatePublished",
"SortAscending": False,
"isPicker": False,
"NoticeTASStatus": [],
"IsSustainable": False,
"NoticeDisplayType": None,
"NoticeSearchTotalLabelId": "noticeSearchTotal",
"TypeOfCompetitions": []
}

page, all_data = 0, []
while True:
print('Page {}...'.format(page))

payload['PageIndex'] = page
soup = BeautifulSoup( requests.post(url, json=payload).content, 'html.parser' )
rows = soup.select('.tableRow')
if not rows:
break

for row in rows:
cells = [cell.get_text(strip=True) for cell in row.select('.tableCell')]
print(cells[1])
print('{:<30}{:<15}{:<15}{:<25}{:<45}{:<15}'.format(*cells[2:]))
print('-'*80)

# we are only interested in Afghanistan:
if 'afghanistan' in cells[7].lower():
all_data.append([row['data-noticeid'], *cells[1:]])

page += 1

# write to csv file:
with open('data.csv', 'w', newline='') as csvfile:
csv_writer = csv.writer(csvfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
for row in all_data:
csv_writer.writerow(row)
已保存 data.csv (来自 LibreOffice 的屏幕截图):
enter image description here

关于python - 使用 BeautifulSoup 遍历 div 表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62857309/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com