gpt4 book ai didi

python - python web scraper 中的错误无法正常运行

转载 作者:行者123 更新时间:2023-12-04 04:11:39 25 4
gpt4 key购买 nike

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20cards'

# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

#html parsing
page_soup = soup(page_html, "html.parser")

#grabs each product
containers = page_soup.findAll("div", {"class":"item-container"})

for container in containers:
brand = container[0].img["title"].title()

title_container = container.findAll("a", {"class":"item-title"})
product_name = title_container[0].txt


shipping_container = container.findAll("li", {"class":"price-ship"})
shipping = shipping_container[0].text.strip()


print("Brand: "+ brand)
print("product name: "+ product_name)
print("shipping: "+ shipping)

运行该程序后,出现以下错误。

Traceback (most recent call last): File "my_first_websraper.py", line 18, in brand = container[0].img["title"].title() File "C:\Users\MyUserName\AppData\Local\Programs\Python\Python38-32\lib\site-packages\bs4\element.py", line 1368, in getitem return self.attrs[key] KeyError: 0

当他在教程中运行它时,它不仅正确地列出了所有内容,而且以相同的方式列出了网站上的所有内容。关于如何解决此问题的任何想法?

要了解该视频的外观,请观看 28:55:https://www.youtube.com/watch?v=XQgXKtPSzUI

最佳答案

我知道这不是使用与您相同的包,甚至不是接近相同的代码,但我能够使用 selenium 获取每件商品及其价格!我在使用其他库时遇到过问题,因为它们只能获取 html 内容而不能使用 headless 浏览器(通常)。这会导致呈现的网页出现问题,因为它们会在呈现所有产品之前获取页面。

我使用这个 selenium 脚本在页面上获取了价格:

编辑:添加排序

编辑:添加了 excel 输出和数字格式

url = "https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20cards"

driver.get(url)

# let the page load
time.sleep(5)

get_price = lambda x: x.text.split(' ')[0].replace('$', '').replace('Free', '0')

# get all the prices of the products on the page
prices = [{'product': item.find_element_by_class_name('item-title').text,
'price': get_price(item.find_element_by_class_name('price-current')),
'shipping': get_price(item.find_element_by_class_name('price-ship'))}
for item in driver.find_elements_by_class_name('item-info')]

prices_sorted = sorted(prices, key=lambda x: x['price'])

# prettify the output with json
import json
print(json.dumps(prices_sorted, indent=4))


# -------------- export to excel --------------
from openpyxl import Workbook

# create the workbook
wb = Workbook()

# select the first sheet
ws = wb.active
# write the header row
ws.append([key for key in prices_sorted[0].keys()])
for row in prices_sorted:
# write each row
ws.append([value for value in row.values()])

path = './prices.xlsx'
# save the file
wb.save(filename = path)

输出:

[
{
"product": "GIGABYTE Radeon RX 570 DirectX 12 GV-RX570GAMING-4GD REV2.0 Video Card",
"price": "$119.99",
"shipping": "Free"
},
{
"product": "ASRock Phantom Gaming D Radeon RX 570 DirectX 12 RX570 4G Video Card",
"price": "$119.99",
"shipping": "Free"
},
{
"product": "MSI Radeon RX 570 DirectX 12 RX 570 8GT OC Video Card",
"price": "$135.99",
"shipping": "Free"
},
{
"product": "XFX Radeon RX 580 DirectX 12 RX-580P8RFD6 Video Card",
"price": "$189.99",
"shipping": "$5.99"
},
{
"product": "MSI GeForce GTX 1660 SUPER DirectX 12 GTX 1660 SUPER VENTUS XS OC Video Card",
"price": "$249.99",
"shipping": "Free"
},
{
"product": "SAPPHIRE PULSE Radeon RX 5600 XT DirectX 12 100419P6GL Video Card",
"price": "$289.99",
"shipping": "$3.99"
},
{
"product": "EVGA GeForce GTX 1660 Ti SC ULTRA GAMING, 06G-P4-1667-KR, 6GB GDDR6, Dual Fan, Metal Backplate",
"price": "$299.99",
"shipping": "Free"
},
{
"product": "EVGA GeForce RTX 2060 KO ULTRA GAMING Video Card, 06G-P4-2068-KR, 6GB GDDR6, Dual Fans, Metal Backplate",
"price": "$319.99",
"shipping": "Free"
},
{
"product": "MSI GeForce RTX 2060 DirectX 12 RTX 2060 VENTUS XS 6G OC Video Card",
"price": "$339.99",
"shipping": "Free"
},
{
"product": "ASUS GeForce RTX 2060 Overclocked 6G GDDR6 Dual-Fan EVO Edition Graphics Card (DUAL-RTX2060-O6G-EVO)",
"price": "$349.99",
"shipping": "Free"
},
{
"product": "ASUS ROG Strix Radeon RX 5700 XT ROG-STRIX-RX5700XT-O8G-GAMING Video Card",
"price": "$459.99",
"shipping": "Free"
},
{
"product": "GIGABYTE GeForce RTX 2070 Super WINDFORCE OC 3X 8G Graphics Card, GV-N207SWF3OC-8GD",
"price": "$499.99",
"shipping": "Free"
}
]

Excel输出:

enter image description here

这是 Colab 工作表的链接,您可以自己运行它:https://drive.google.com/open?id=1LLTyZ0ATiUS3f-WJdGvnlaUXv0h8U4i-

关于python - python web scraper 中的错误无法正常运行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61626554/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com