gpt4 book ai didi

python - 如何在Beautifulsoup中提取标签的子项?

转载 作者:太空宇宙 更新时间:2023-11-03 14:26:07 25 4
gpt4 key购买 nike

我有以下代码,我想提取其中的内容

<p><strong>1. Start big</strong><br><br>
Make a slam dunk right away. Boom! Just do it! Start strong! If you’re making a list article about poodle outerwear, don’t save the best for last: put that sporty little pool-vest idea right up there at the top. </p>

<p><strong>2. Hook them and hook them good</strong><br><br>
A recent study of lists (included in another article about the top ten research studies, natch), assembled by some guy you’ve never heard of from an obscure European university in his spare time, found that Web readers usually don’t make it past the first few items on a list. Sad, isn’t it? I bet you’re already thinking about stopping. Yes, it sucks to know people have shorter attention spans than an overly-caffeinated Himalayan fruit-fly. Make the first few count, okay?</p>

<p><strong>3. Stay on message</strong><br><br>
Let’s say you’re writing a list article about the top movies starring Naomi Watts that don’t suck. It’s a short list, if you remember anything about King Kong or her early indie films. I see this kind of thing pop up on <a href="http://www.foxnews.com" rel="nofollow">Fox News</a> and <a href="http://www.metacritic.com" rel="nofollow">Metacritic</a> once in awhile, and I usually can’t stop myself from clicking on them. You get into sort of a click-trance. In fact, hang on a second. I think there might be one on the top opening acts when The Bieb performs in space. Oh yes there is! Okay, back. So, in your article list of the top movies that use a Meatloaf song in the soundtrack, adding that one from Black Sabbath is just not proper usage. We want Meatloaf and Meatloaf only, people! Besides, Black Sabbath is for sissies.</p>

我用Python提取p标签的代码是

soup = BeautifulSoup(page, "lxml")

for content in soup.find_all('p'):
print(content)

我应该添加什么来提取强效?

我已经尝试过soup.find_all('p > Strong')

最佳答案

from bs4 import BeautifulSoup

page = """
<p><strong>1. Start big</strong><br><br>
Make a slam dunk right away. Boom! Just do it! Start strong! If you’re making a list article about poodle outerwear, don’t save the best for last: put that sporty little pool-vest idea right up there at the top. </p>

<p><strong>2. Hook them and hook them good</strong><br><br>
A recent study of lists (included in another article about the top ten research studies, natch), assembled by some guy you’ve never heard of from an obscure European university in his spare time, found that Web readers usually don’t make it past the first few items on a list. Sad, isn’t it? I bet you’re already thinking about stopping. Yes, it sucks to know people have shorter attention spans than an overly-caffeinated Himalayan fruit-fly. Make the first few count, okay?</p>

<p><strong>3. Stay on message</strong><br><br>
Let’s say you’re writing a list article about the top movies starring Naomi Watts that don’t suck. It’s a short list, if you remember anything about King Kong or her early indie films. I see this kind of thing pop up on <a href="http://www.foxnews.com" rel="nofollow">Fox News</a> and <a href="http://www.metacritic.com" rel="nofollow">Metacritic</a> once in awhile, and I usually can’t stop myself from clicking on them. You get into sort of a click-trance. In fact, hang on a second. I think there might be one on the top opening acts when The Bieb performs in space. Oh yes there is! Okay, back. So, in your article list of the top movies that use a Meatloaf song in the soundtrack, adding that one from Black Sabbath is just not proper usage. We want Meatloaf and Meatloaf only, people! Besides, Black Sabbath is for sissies.</p>
"""

soup = BeautifulSoup(page, 'lxml')

for content in soup.select('p > strong'):
print(content)

输出:

<strong>1. Start big</strong>
<strong>2. Hook them and hook them good</strong>
<strong>3. Stay on message</strong>

您需要对 CSS 选择器使用 .select 方法,而不是 .find

您可以在 .select here 上找到 bs4 文档,以及来自 w3schools here 的一些 CSS 选择器文档.

关于python - 如何在Beautifulsoup中提取标签的子项?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47632203/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com