作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
一个 div
的 class="tableBody"
有很多div
小时候。我想得到它的所有div
child 并得到我在这张照片中突出显示的字符串。
import bs4 as bs
import urllib.request
source = urllib.request.urlopen("https://www.ungm.org/Public/Notice").read()
soup = bs.BeautifulSoup(source,'lxml')
t_body = soup.find("div", class_="tableBody")
t_divs = t_body.find_all("div")
上面的代码返回一个空列表。
最佳答案
您在页面上看到的数据是通过 JavaScript 动态加载的。您可以使用 requests
模块来模拟它。
例如:
import requests
from bs4 import BeautifulSoup
url = 'https://www.ungm.org/Public/Notice/Search'
payload = {
"PageIndex": 0,
"PageSize": 15,
"Title": "",
"Description": "",
"Reference": "",
"PublishedFrom": "",
"PublishedTo": "12-Jul-2020",
"DeadlineFrom": "12-Jul-2020",
"DeadlineTo": "",
"Countries": [],
"Agencies": [],
"UNSPSCs": [],
"NoticeTypes": [],
"SortField": "DatePublished",
"SortAscending": False,
"isPicker": False,
"NoticeTASStatus": [],
"IsSustainable": False,
"NoticeDisplayType": None,
"NoticeSearchTotalLabelId": "noticeSearchTotal",
"TypeOfCompetitions": []
}
soup = BeautifulSoup( requests.post(url, json=payload).content, 'html.parser' )
for row in soup.select('.tableRow'):
cells = [cell.get_text(strip=True) for cell in row.select('.tableCell')]
print(cells[1])
print('{:<30}{:<15}{:<15}{:<25}{:<45}{:<15}'.format(*cells[2:]))
print('-'*80)
打印:
Supply and delivery of 78 smartphones
13-Jul-2020 11:00 (GMT 2.00) 11-Jul-2020 FAO Request for quotation 2020/FRMLW/FRMLW/106096 Malawi
--------------------------------------------------------------------------------
Supply of LEGUMES SEEDS for rainfed season
23-Jul-2020 14:00 (GMT 2.00) 11-Jul-2020 FAO Invitation to bid 2020/FRMLW/FRMLW/106051 Malawi
--------------------------------------------------------------------------------
Supply of MAIZE SEEDS for rainfed season
22-Jul-2020 14:00 (GMT 2.00) 11-Jul-2020 FAO Invitation to bid 2020/FRMLW/FRMLW/106050 Malawi
--------------------------------------------------------------------------------
Procurement of Supply and Installation of Outdoor Metal Furniture for Rooftop Terrace at FAO Headquarters in Rome, Italy
10-Aug-2020 12:00 (GMT 2.00) 11-Jul-2020 FAO Invitation to bid 2020/CSAPC/CSDID/105286 Italy
--------------------------------------------------------------------------------
Procurement of Silo for Emergency Project
13-Jul-2020 13:00 (GMT 5.00) 11-Jul-2020 FAO Invitation to bid 2020/FABGD/FABGD/106145 Bangladesh
--------------------------------------------------------------------------------
Procurement of Concentrate Ruminant Feed
13-Jul-2020 13:00 (GMT 5.00) 11-Jul-2020 FAO Invitation to bid 2020/FABGD/FABGD/106064 Bangladesh
--------------------------------------------------------------------------------
Purchase of Waste Collection Vehicles - (Two Tractors)
22-Jul-2020 06:30 (GMT 0.00) 11-Jul-2020 UNOPS Request for quotation RFQ/2020/15298 Sri Lanka
--------------------------------------------------------------------------------
Procurement of Laboratory Equipment and Material
24-Jul-2020 22:23 (GMT -1.00) 11-Jul-2020 FAO Invitation to bid 2020/FRGAM/FRGAM/106143 Gambia
--------------------------------------------------------------------------------
Compra de chalecos para promotores comunitarios para la Oficina de Unicef Bolivar - LRFQ-2020-9159352
16-Jul-2020 23:59 (GMT -3.00) 11-Jul-2020 UNICEF Request for proposal LRFQ-2020-9159352 Venezuela
--------------------------------------------------------------------------------
Call for Proposals Quality Based Fixed Budget (CFPFB):
26-Jul-2020 17:00 (GMT 3.00) 11-Jul-2020 UNDP Request for proposal UNDP-SYR-RPA-051-20 Syrian Arab Republic
--------------------------------------------------------------------------------
Innovation and Design Specialist
27-Jul-2020 00:00 (GMT -5.00) 11-Jul-2020 UNDP Not set Innovation and Design Specialist Turkey
--------------------------------------------------------------------------------
(RFI) from national and/or international CSOs/NGOs for potential partnership with UNDP and its pooled funding mechanism, the Darfur Community Peace and Stability Fund (DCPSF),
26-Jul-2020 08:00 (GMT -7.00) 11-Jul-2020 UNDP Request for information RFI-SDN-20-002 Sudan
--------------------------------------------------------------------------------
IRAQ-LRPS-017-2020-9159660 Rehabilitation of 3 water projects at Avrek, Grey Basi and Sarsenk in Duhok
26-Jul-2020 12:00 (GMT 3.00) 11-Jul-2020 UNICEF Request for proposal 9159660 Iraq
--------------------------------------------------------------------------------
106142 INVITACIÓN A COTIZAR PARA LA ADQUISICIÓN DE FERTILIZANTES, HERRAMIENTAS Y MATERIALES PARA ECA DE CACAO
21-Jul-2020 22:00 (GMT -5.00) 10-Jul-2020 FAO Request for quotation 2020/FLCOL/FLCOL/106142 Colombia
--------------------------------------------------------------------------------
Achat de tablettes, de GPS et batteries rechargeable (206 tablettes, 68 GPS, et 181 pack chargeurs et batteries rechargeables) à livrer sur Dakar
28-Jul-2020 12:00 (GMT 0.00) 10-Jul-2020 FAO Invitation to bid 2020/FRSEN/FRSEN/106093 United Kingdom
--------------------------------------------------------------------------------
import csv
import requests
from bs4 import BeautifulSoup
url = 'https://www.ungm.org/Public/Notice/Search'
payload = {
"PageIndex": 0,
"PageSize": 15,
"Title": "",
"Description": "",
"Reference": "",
"PublishedFrom": "",
"PublishedTo": "12-Jul-2020",
"DeadlineFrom": "12-Jul-2020",
"DeadlineTo": "",
"Countries": [],
"Agencies": [],
"UNSPSCs": [],
"NoticeTypes": [],
"SortField": "DatePublished",
"SortAscending": False,
"isPicker": False,
"NoticeTASStatus": [],
"IsSustainable": False,
"NoticeDisplayType": None,
"NoticeSearchTotalLabelId": "noticeSearchTotal",
"TypeOfCompetitions": []
}
page, all_data = 0, []
while True:
print('Page {}...'.format(page))
payload['PageIndex'] = page
soup = BeautifulSoup( requests.post(url, json=payload).content, 'html.parser' )
rows = soup.select('.tableRow')
if not rows:
break
for row in rows:
cells = [cell.get_text(strip=True) for cell in row.select('.tableCell')]
print(cells[1])
print('{:<30}{:<15}{:<15}{:<25}{:<45}{:<15}'.format(*cells[2:]))
print('-'*80)
# we are only interested in Afghanistan:
if 'afghanistan' in cells[7].lower():
all_data.append([row['data-noticeid'], *cells[1:]])
page += 1
# write to csv file:
with open('data.csv', 'w', newline='') as csvfile:
csv_writer = csv.writer(csvfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
for row in all_data:
csv_writer.writerow(row)
已保存
data.csv
(来自 LibreOffice 的屏幕截图):
关于python - 使用 BeautifulSoup 遍历 div 表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62857309/
我是一名优秀的程序员,十分优秀!