gpt4 book ai didi

python - 无法使用 python Beautifulsoup 获取表 文本值

转载 作者:行者123 更新时间:2023-12-01 07:45:35 24 4
gpt4 key购买 nike

我尝试从 td 标记获取表格文本值,但总是得到一个空列表。

这是link从我试图提取表值的地方。

这是我尝试过的。

response = requests.get('https://www.international-pc.com/product/interfine-629')
soup = BeautifulSoup(response.text, 'html.parser')
tables = soup.find("table", {"id": "documentTable-1"}).find_all("tbody")
print(tables)
Output : []

HTML

<table id="documentTable-1" class="display dataTable no-footer" data-table="" role="grid" aria-describedby="documentTable-1_info" style="width: 1138px;">
<thead>
<tr role="row"><th class="sorting_asc" tabindex="0" aria-controls="documentTable-1" rowspan="1" colspan="1" style="width: 391px;" aria-sort="ascending" aria-label="PRODUCT DATASHEET: activate to sort column descending">PRODUCT DATASHEET</th><th class="sorting" tabindex="0" aria-controls="documentTable-1" rowspan="1" colspan="1" style="width: 455px;" aria-label="LANGUAGE: activate to sort column ascending">LANGUAGE</th><th class="sorting" tabindex="0" aria-controls="documentTable-1" rowspan="1" colspan="1" style="width: 232px;" aria-label="DOWNLOAD: activate to sort column ascending">DOWNLOAD</th></tr>
</thead>
<tbody><tr role="row" class="odd"><td class="sorting_1">Interfine 629</td><td>English (United Kingdom)</td><td><a href="https://international.brand.akzonobel.com/m/1ff7b0196600886b/original/Interfine_629_eng_A4_20151012.pdf" target="_blank">PDF</a></td></tr><tr role="row" class="even"><td class="sorting_1">Interfine 629</td><td>Korean (Korea, Republic of)</td><td><a href="https://international.brand.akzonobel.com/m/664b77540ff01960/original/Interfine_629_kor_A4_19000101.pdf" target="_blank">PDF</a></td></tr><tr role="row" class="odd"><td class="sorting_1">Interfine 629</td><td>Chinese (China)</td><td><a href="https://international.brand.akzonobel.com/m/6980eb615ebe99f0/original/Interfine_629_chi_s_A4_20150205.pdf" target="_blank">PDF</a></td></tr></tbody></table>

我想从表中提取所有三行文本值。

有什么建议吗?

最佳答案

https://www.international-pc.com/product/interfine-629网站链接是动态渲染请求表数据。您应该尝试自动化 selenium 库。它允许您抓取动态渲染请求(js 或 ajax)页面数据。

试试这个:

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome("/usr/bin/chromedriver")
driver.get('https://www.international-pc.com/product/interfine-629')

soup = BeautifulSoup(driver.page_source, 'lxml')

tables = soup.find("table", {"id": "documentTable-1"}).find("tbody")

for tr in tables.find_all("tr"):
for td in tr.find_all("td"):
print(td.text)
link = td.find("a",href=True)

if link is None:
continue
print(link['href'])

操作:

 Interfine 629
Chinese (China)
PDF
https://international.brand.akzonobel.com/m/6980eb615ebe99f0/original/Interfine_629_chi_s_A4_20150205.pdf
Interfine 629
Korean (Korea, Republic of)
PDF
https://international.brand.akzonobel.com/m/664b77540ff01960/original/Interfine_629_kor_A4_19000101.pdf
Interfine 629
English (United Kingdom)
PDF
https://international.brand.akzonobel.com/m/1ff7b0196600886b/original/Interfine_629_eng_A4_20151012.pdf

其中 '/usr/bin/chromedriver' selenium Web 驱动程序路径。

下载适用于 Chrome 浏览器的 selenium Web 驱动程序:

http://chromedriver.chromium.org/downloads

安装 Chrome 浏览器的网络驱动程序:

https://christopher.su/2015/selenium-chromedriver-ubuntu/

Selenium 教程:

https://selenium-python.readthedocs.io/

关于python - 无法使用 python Beautifulsoup 获取表 <td> 文本值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56475974/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com