gpt4 book ai didi

python - 使用 python 和 beautifulsoup 从 td 标签中获取数据

转载 作者:可可西里 更新时间:2023-11-01 13:41:25 27 4
gpt4 key购买 nike

我是 Python 的初学者,正在使用我熟悉的数据完成一些任务以学习基础知识。我正在尝试通过表格进行爬网以收集联系信息,但在获取 tds 列表中的数据时遇到问题。

HTML 看起来像这样:

<table class="table table-striped" data-drupal-selector="edit-directory" id="edit-directory--zJwP9mT4moQ">
<thead>
<tr>
<th>Name</th>
<th>Job Title</th>
<th>Campus/Department</th>
<th>Contact</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>LAST, FIRST</td>
<td>T-HS SCI- GEN'L</td>
<td><span tabindex="0">SCHOOL</span></td>
<td><a href="mailto:teacher@school.org" class="email"><span aria-hidden="true">Email</span><span class="sr-only">teacher@school.org</span></a><br>555-555-5555</td>
</tr>
</table>

我有这个代码来获取表格

data = urllib.parse.urlencode(params).encode("utf-8")
req = urllib.request.Request(url)
with urllib.request.urlopen(req,data=data) as f:
soup = bs(f, 'html.parser')

table = soup.find("table")

for row in table.findAll("tr"):
#print (row)
cells = row.findAll("td")
print(cells)

我得到这样的东西:

[<td>LAST,FIRST </td>, <td>TEMP PROF</td>, <td><span tabindex="0">SCHOOL</span></td>, <td><a class="email" href="mailto:teacher@school.org"><span aria-hidden="true">Email</span><span class="sr-only">teacher@school.org</span></a><br/>555-555-5555</td>]

[<td><a href="https://teachersite.com" target="_blank">LAST, FIRST</a></td>, <td>T-ENGLISH</td>, <td><span tabindex="0">SCHOOL</span></td>, <td><a class="email" href="mailto:teacher@school.org"><span aria-hidden="true">Email</span><span class="sr-only">teacher@school.org/span></a><br/>555-555-5555</td>]

但是如果我尝试获取列表中的数据:

print (cells[1]) 

它说索引超出范围

我想得到的是这样的:

last = 'LAST'
first = 'FIRST'
email = 'teacher@school.com'
title = 'TEMP PROF'
phone = '555-555-5555'

最佳答案

似乎您想从每个元素中剥离文本:

for row in table.findAll('tr'):
cols = row.findAll('td')
cols = [element.text.strip() for element in cols]
for col in cols:
print(col)

要查找名字和姓氏,您可以用逗号和空格分隔第一个元素:.split(', ')。希望这能为您指明正确的方向!

关于python - 使用 python 和 beautifulsoup 从 td 标签中获取数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56943444/

27 4 0