gpt4 book ai didi

python - 抓取嵌套标签

转载 作者:太空宇宙 更新时间:2023-11-03 17:44:00 25 4
gpt4 key购买 nike

我知道此类问题经常出现,但我一直在浏览并没有看到类似的问题。

<div class="casts">
<table cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td class="">
<a class="cast">
<span class="title">
Nested data 1
<span class="schedule">
Nested data 2
</span>
</span>
</a>
</td>
</tr>
</tbody>
</table>
</div>

有多个具有相同结构的 td,但是为了简单起见,我删除了其余的。假设我想从我使用的跨度中提取数据嵌套数据 1嵌套数据 2:

finda = soup.find_all('a', attrs={'class':'cast'})

for var in finda:
var2 = var.find_all('span')

使用:

var2[1]

我能够提取所有嵌套数据2

但我无法仅提取嵌套数据1

var2[0]

将返回嵌套数据2嵌套数据1

最佳答案

这可以通过迭代每个跨度的子代以或多或少简单的方式来完成:

stack.html:

<!DOCTYPE html>
<html lang="en">
<head>
<title>StackO</title>
<meta charset="utf-8">
</head>
<body>
<div class="casts">
<table cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td class="">
<a class="cast">
<span class="title">
Nested data 1
<span class="schedule">
Nested data 2
<span class="moar-nesting">
Nested data 3
</span>
</span>
Nested data 4
</span>
</a>
</td>
</tr>
</tbody>
</table>
</div>
</body>
</html>

同时,在 ipython 土地上......

In [1]: from bs4 import BeautifulSoup, NavigableString, Comment

In [2]: with open('stack.html', 'r') as f:
...: markup = f.read()
...:

In [3]: soup = BeautifulSoup(markup)

In [4]: casts = soup.find_all('a', attrs={'class': 'cast'})

In [5]: cast = casts[0]

In [6]: for span in cast.find_all('span'):
...: for child in span.children:
...: if isinstance(child, NavigableString) and not isinstance(child, Comment) and str(child).strip() != "":
...: print '"{}"'.format(str(child).strip())
...:
"Nested data 1"
"Nested data 4"
"Nested data 2"
"Nested data 3"

In [10]:

关于python - 抓取嵌套标签,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30084673/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com