gpt4 book ai didi

Python 网站使用 'soup.findall' 抓取所有标签

转载 作者:行者123 更新时间:2023-12-01 08:22:19 28 4
gpt4 key购买 nike

我刚刚开始涉足 Python,正如许多人所做的那样,我从一个网络抓取示例开始尝试该语言。我试图收集某种标签类型的所有内容并作为列表返回。为此,我使用 BeautifulSoup 和 requests。用于此测试的网站是一个名为“Staxel”的小游戏的博客

我可以让我的代码使用 [soup.find] 和 [print] 输出标签的第一次出现,但是当我将代码更改为下面时,我收到有关将列表打印为固定变量的警告。

有人可以指出我应该为此使用什么吗?

# import libraries
import requests
import ssl
from bs4 import BeautifulSoup

# set the URL string
quote_page = 'https://blog.playstaxel.com'

# query the website and return the html to give us a 'page' variable
page = requests.get(quote_page)


# parse the html using beautiful soup and store in a variable ... 'soup'
soup = BeautifulSoup(page.content, 'lxml')

# Remove the 'div' of name and get it's value
name_box = soup.find_all('h1',attrs={'class':'entry-title'})
name = name_box.text.strip() #strip() is used to remove the starting and trailing
print ("Title {}".format(name))

最佳答案

通过使用.find_all(),您将创建所有出现h1列表。您只需将 print 语句包装在 for 循环中即可。具有该结构的代码如下所示:

# import libraries
import requests
import ssl
from bs4 import BeautifulSoup

# set the URL string
quote_page = 'https://blog.playstaxel.com'

# query the website and return the html to give us a 'page' variable
page = requests.get(quote_page)


# parse the html using beautiful soup and store in a variable ... 'soup'
soup = BeautifulSoup(page.content, 'lxml')

# Remove the 'div' of name and get it's value
name_box = soup.find_all('h1',attrs={'class':'entry-title'})
for name in name_box:
print ("Title {}".format(name.text.strip()))

输出:

Title Magic update – feature preview
Title New Years
Title Staxel Changelog for 1.3.52
Title Staxel Changelog for 1.3.49
Title Staxel Changelog for 1.3.48
Title Halloween Update & GOG
Title Staxel Changelog for 1.3.44
Title Staxel Changelog for 1.3.42
Title Staxel Changelog for 1.3.40
Title Staxel Changelog for 1.3.34 to 1.3.39

关于Python 网站使用 'soup.findall' 抓取所有标签,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54545486/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com