gpt4 book ai didi

How to scrape Title and Price (Beautifulsoup)(如何刮标题和价格(Beautifulsoup))

转载 作者:bug小助手 更新时间:2023-10-22 13:03:31 34 4
gpt4 key购买 nike



I'm trying to get all the album names and prices from this website: https://vinilosalvaro.cl/tienda/

我正在尝试从这个网站获取所有专辑的名称和价格:https://vinilosalvaro.cl/tienda/


But with the following script I'm just getting one of them.

但在下面的剧本中,我只得到了其中一个。


import requests
from bs4 import BeautifulSoup


URL = 'https://vinilosalvaro.cl/tienda/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
page = requests.get(URL, headers=headers)

soup = BeautifulSoup(page.content, 'html.parser')
listado_productos = soup.find_all('ul', class_='products columns-3')

for listado_productos in listado_productos:
titulos = listado_productos.find('h2', class_='woocommerce-loop-product__title').text.strip()
precios = listado_productos.find('span', class_='woocommerce-Price-amount amount').text.strip()
print(titulos)
print(precios)

How to get all the album names and prices?

如何获取所有专辑名称和价格?


更多回答
优秀答案推荐

Main issue is that your selection give you a ResultSet of one <ul> not all the <li> so your loop just iterate one time.

主要问题是,您的选择会给您一个<ul>的ResultSet,而不是所有的<li>,所以您的循环只迭代一次。


Select your Elements more specific as also mentioned @benyamin payandeh or for example with css selectors:

选择更具体的元素,如@benyamin-payandeh所述,或者例如使用css选择器:


soup.select('ul.products li')

In addition also some concepts like while loop for paging, .get_text(strip=True) and how to store your results over iterations in a more structured form like a list of dicts that you can simply transform in dataframes or process like you need.

此外,还有一些概念,如用于分页的while循环、.get_text(strip=True)以及如何以更结构化的形式在迭代中存储结果,如dict列表,您可以简单地在数据帧中转换或根据需要进行处理。


Example

Be aware this will start from page 59, to show how the while loop works and break if there is no more page to scrape. Simply set URL to your default value to iterate all pages

请注意,这将从第59页开始,以显示while循环是如何工作的,如果没有更多的页面可刮,则会中断。只需将URL设置为默认值即可迭代所有页面


import requests
from bs4 import BeautifulSoup

URL = 'https://vinilosalvaro.cl/tienda/page/59/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

data = []

while True:
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')

for listado_productos in soup.select('ul.products li'):
data.append({
'titulos' : listado_productos.h2.get_text(strip=True),
'precios' : listado_productos.span.get_text(strip=True)
})

if soup.select_one('a.next'):
URL = soup.select_one('a.next').get('href')
else:
break
data


Instead of find_all you can use find when searching for one specific ul tags. go ahead and change line 10 to:

在搜索一个特定的ul标签时,可以使用find而不是find_all。继续,将第10行更改为:


listado_productos = soup.find('ul', class_='products columns-3')

Also, to get li childs, you should use find_all('li'), so change the line 12 to:

此外,要获取li childs,您应该使用find_all('li'),因此将第12行更改为:


for listado_productos in listado_productos.find_all('li'):

更多回答

34 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com