gpt4 book ai didi

python - 如何格式化爬虫输出

转载 作者:行者123 更新时间:2023-12-01 07:28:04 25 4
gpt4 key购买 nike

我正在尝试从一个网站推断价格,以便创建一个我在下面编写的程序的爬虫。为了获取所有 html 代码,我使用了 BeautifulSoup 和默认的 html.parser。然后我尝试使用名为 generice 等于 soup.findAll("span") 的变量来清理信息。然后我需要进一步清理(列表(我想)它已经创建)以便获得价格,但我陷入了困境。有什么建议么?我不知道如何思考才能解决问题

import smtplib

import time

from bs4 import BeautifulSoup as bs

import requests

URL = "https://www.allkeyshop.com/blog/buy-battlefield-5-cd-key-compare-prices/"

headers = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0"}

def Check_page1():

page = requests.get(URL, headers=headers)

soup = bs(page.content, 'html.parser')

generale = soup.findAll('span')

price = ?

print(price)

print(generale)

print(Check_page1())

最佳答案

当您查看页面的源代码时,您可以看到您正在寻找 <span>类名 price ,可以这样解析:

import time

import requests
from bs4 import BeautifulSoup as bs

URL = "https://www.allkeyshop.com/blog/buy-battlefield-5-cd-key-compare-prices/"
headers = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0"}

def CheckPage1():
page = requests.get(URL, headers=headers)
soup = bs(page.content, 'html.parser')

# all spans with prices
span_prices = soup.findAll("span", {"class": "price"})

# to get all prices you need to extract text or content attribute
for span in span_prices:
price = span.text
# remove whitespace and print price
print(price.strip())

# to get prices without money sign uncomment one of those lines
# print(price.strip()[:-1])
# print(price.strip().strip('€'))

CheckPage1()

关于python - 如何格式化爬虫输出,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57349796/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com