gpt4 book ai didi

python - 带有换行符的网页抓取数据

转载 作者:行者123 更新时间:2023-12-01 01:42:38 25 4
gpt4 key购买 nike

我正在尝试从这个网址中抓取数据:https://www.apple.com/ca/shop/browse/home/specialdeals/mac/macbook_pro/13

我正在尝试检索显示“

”的行

8GB 2133MHz LPDDR3 板载内存

16GB 2133MHz LPDDR3 板载内存

containers = soup.findAll('tr', {'class': 'product'}) 中的每个容器中使用 BeautifulSoup。问题是它周围有换行符和多个换行符,这使我很难解析。我怎样才能找回这个?

最佳答案

查看源代码,最好的选择是将 BeautifulSoup正则表达式结合起来:

import requests
from bs4 import BeautifulSoup
import re

url = "https://www.apple.com/ca/shop/browse/home/specialdeals/mac/macbook_pro/13"

r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
for td in soup.select('td.specs'):
m = re.search('^(8|16).*?onboard memory.*?$', td.text, flags=re.M|re.I)
if not m:
continue
print(td.select_one('h3').text.strip())
print('Full text: {} | Memory: {}'.format(m[0].strip(), m[1]))
print('-' * 80)

此代码查找所有 8 或 16 GB 的产品并打印它们:

Refurbished 13.3-inch MacBook Pro 2.3GHz dual-core Intel Core i5 with Retina display - Space Grey
Full text: 8GB of 2133MHz LPDDR3 onboard memory | Memory: 8
--------------------------------------------------------------------------------
Refurbished 13.3-inch MacBook Pro 2.3GHz dual-core Intel Core i5 with Retina display - Silver
Full text: 8GB of 2133MHz LPDDR3 onboard memory | Memory: 8
--------------------------------------------------------------------------------
Refurbished 13.3-inch MacBook Pro 2.0GHz Dual-core Intel Core i5 with Retina Display — Space Grey
Full text: 8GB of 1866MHz LPDDR3 onboard memory | Memory: 8
--------------------------------------------------------------------------------
Refurbished 13.3-inch MacBook Pro 2.3GHz dual-core Intel Core i5 with Retina display - Silver
Full text: 8GB of 2133MHz LPDDR3 onboard memory | Memory: 8
--------------------------------------------------------------------------------
Refurbished 13.3-inch MacBook Pro 2.3GHz dual-core Intel Core i5 with Retina display - Space Grey
Full text: 8GB of 2133MHz LPDDR3 onboard memory | Memory: 8
--------------------------------------------------------------------------------
Refurbished 13.3-inch Macbook Pro 2.9GHz Dual-core Intel Core i5 with Retina Display - Space Grey
Full text: 8GB of 2133MHz LPDDR3 onboard memory | Memory: 8
--------------------------------------------------------------------------------
Refurbished 13.3-inch Macbook Pro 2.9GHz Dual-core Intel Core i5 with Retina Display - Silver
Full text: 8GB of 2133MHz LPDDR3 onboard memory | Memory: 8
--------------------------------------------------------------------------------
Refurbished 13.3-inch Macbook Pro 2.9GHz Dual-core Intel Core i5 with Retina Display - Silver
Full text: 8GB of 2133MHz LPDDR3 onboard memory | Memory: 8
--------------------------------------------------------------------------------
Refurbished 13.3-inch MacBook Pro 3.1GHz dual-core Intel Core i5 with Retina display - Silver
Full text: 8GB of 2133MHz LPDDR3 onboard memory | Memory: 8
--------------------------------------------------------------------------------
Refurbished 13.3-inch MacBook Pro 3.1GHz dual-core Intel Core i5 with Retina display - Space Grey
Full text: 8GB of 2133MHz LPDDR3 onboard memory | Memory: 8
--------------------------------------------------------------------------------
Refurbished 13.3-inch Macbook Pro 3.3GHz Dual-core Intel Core i7 with Retina Display - Space Grey
Full text: 16GB of 2133MHz LPDDR3 onboard memory | Memory: 16
--------------------------------------------------------------------------------

关于python - 带有换行符的网页抓取数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51712984/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com