gpt4 book ai didi

Getting : Crawled (302) while scraping laptop data using Scrapy(获取:已爬行(302),同时使用Scrapy擦除笔记本电脑数据)

转载 作者:bug小助手 更新时间:2023-10-25 23:00:09 28 4
gpt4 key购买 nike



I want to scrape data like screen type and title etc from https://www.newegg.com/tools/laptop-finder
but I am stuck as my script gets crawled but not scraped

我想从https://www.newegg.com/tools/laptop-finder中抓取屏幕类型和标题等数据,但我的脚本被抓取但没有被抓取,我被卡住了


HTML code of the website is

该网站的Html代码为


<tr>
<td class="td-item">
<a class="goods-info" href="https://www.newegg.com/p/N82E16834156430?Item=N82E16834156430" data-toggle="modal" data-target="#modal-pc-builder-pdp">
<div class="goods-img">
<img src="https://c1.neweggimages.com/ProductImageCompressAll125/34-156-430-03.jpg" alt="MSI Katana 15 B12VGK-082US 15.6&quot; Gaming Laptop">
</div>
<div class="goods-title">
<div class="goods-title-content">MSI Katana 15 B12VGK-082US 15.6" Gaming Laptop</div>
<div class="goods-rating">
<i class="rating rating-4" aria-label="rated 4 out of 5"></i>
<span class="goods-rating-num font-s text-gray">(31)</span>
</div>
</div>
</a>
</td>
<td class="td-spec"><div class="hid-text">Screen Size</div><span>15.6"</span></td>
<td class="td-spec"><div class="hid-text">CPU type</div><span>Intel Core i7 12th Gen</span></td>
<td class="td-spec"><div class="hid-text">Memory</div><span>16GB</span></td>
<td class="td-spec"><div class="hid-text">Storage</div><span>1 TB PCIe</span></td>
<td class="td-spec"><div class="hid-text">GPU</div><span>NVIDIA GeForce RTX 4070 Laptop GPU</span></td>
<td class="td-spec"><div class="hid-text">Resolution</div><span>1920 x 1080</span></td>
<td class="td-spec"><div class="hid-text">Weight</div><span>4 - 4.9 lbs.</span></td>
<td class="td-spec"><div class="hid-text">Backlit Keyboard</div><span>Backlit</span></td>
<td class="td-spec"><div class="hid-text">Touchscreen</div><span>No</span></td>
<td class="td-spec"><div class="hid-text">CPU Speed</div><span>12650H (2.30GHz)</span></td>
<td class="td-spec"><div class="hid-text">Number of Cores</div><span>10-core (6P+4E) Processor</span></td>
<td class="td-spec"><div class="hid-text">Color</div><span>Black</span></td>
<td class="td-spec"><div class="hid-text">Display Type</div><span>Full HD</span></td>
<td class="td-spec"><div class="hid-text">Graphic Type</div><span>Dedicated Card</span></td>
<td class="td-spec"><div class="hid-text">Operating System</div><span>Windows 11 Home</span></td>
<td class="td-spec"><div class="hid-text">Webcam</div><span>Yes</span></td>
<td class="td-action">
<div class="item-action grid col-w-3">
<div class="goods-price-current hide-click-for-details">
<div class="goods-price font-s">
<div class="goods-price-current">
<span class="goods-price-label"></span>
<span class="goods-price-symbol">$</span>
<span class="goods-price-value"><strong>1,159</strong><sup>.00</sup></span>
</div>
</div>
</div>
<div class="goods-operate xxs-hide">
<div class="goods-button-area">
<label class="input-check input-check-s compare-check">
<input type="checkbox" autocomplete="off" aria-label="checkbox">
<span class="input-check-mark text-hide">checkmark</span>
<div class="input-check-text">Compare</div>
</label>
<button title="Add MSI Katana 15 B12VGK-082US 15.6&quot; 144 Hz IPS Intel Core i7 12th Gen 12650H (2.30GHz) NVIDIA GeForce RTX 4070 Laptop GPU 16GB Memory 1 TB NVMe SSD Windows 11 Home 64-bit Gaming Laptop to cart" class="button button-s bg-orange">Add to cart</button>
</div>
</div>
</div>
</td>
</tr>


As i am just learning to scrape i scraped only title and screen size for just one laptop
Below is my scrapy code

因为我刚刚学习刮擦,所以我只刮掉了标题和屏幕大小,因为下面是我的刮擦代码


import scrapy

class LaptopSpider(scrapy.Spider):

name = "laptop"
headers = {
"authority": "ssl.doas.state.ga.us",
"pragma": "no-cache",
"cache-control": "no-cache",
"sec-ch-ua": "\"Chromium\";v=\"92\", \" Not A;Brand\";v=\"99\", \"Google Chrome\";v=\"92\"",
"accept": "application/json, text/javascript, */*; q=0.01",
"x-requested-with": "XMLHttpRequest",
"sec-ch-ua-mobile": "?0",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36",
"content-type": "application/x-www-form-urlencoded; charset=UTF-8",
"origin": "https://ssl.doas.state.ga.us",
"sec-fetch-site": "same-origin",
"sec-fetch-mode": "cors",
"sec-fetch-dest": "empty",
"referer": "https://ssl.doas.state.ga.us/gpr/",
"accept-language": "en-US,en;q=0.9"
}
start_urls = ['https://www.newegg.com/tools/laptop-finder']
custom_settings = {'REDIRECT_ENABLED': False}
handle_httpstatus_list = [302]

def parse(self, response):
product = response.css('tr td.td-item')

for item in product:
yield {
'Title': item.css('.goods-title-content::text').get(),
'Screen Size': item.xpath('.//div[text()="Screen Size"]/following-sibling::span/text()').get(),
}

My log file is

我的日志文件是


2023-09-10 10:12:28 [scrapy.core.engine] INFO: Spider opened
2023-09-10 10:12:28 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2023-09-10 10:12:28 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2023-09-10 10:12:29 [scrapy.core.engine] DEBUG: Crawled (302) <GET https://www.newegg.com/tools/laptop-finder> (referer: None)
2023-09-10 10:12:29 [scrapy.core.engine] INFO: Closing spider (finished)
2023-09-10 10:12:29 [scrapy.extensions.feedexport] INFO: Stored json feed (0 items) in: j.json
2023-09-10 10:12:29 [scrapy.statscollectors] INFO: Dumping Scrapy stats:


Help me out

帮帮我


更多回答

Does this answer your question? how to handle 302 redirect in scrapy

这回答了你的问题吗?如何在SCRIPY中处理302重定向

优秀答案推荐

It is impossible to give a clear, direct answer to this question.

这个问题不可能给出一个明确、直接的答案。


You need to take this knowledge as a base:

你需要以这些知识为基础:



  1. 302 status means redirect. Typically this can be used by the server to set cookies. Therefore, there is no need to disable redirects and enable 302 response processing.



  2. You can use ways to debug your code https://docs.scrapy.org/en/latest/topics/debug.html



  3. I would recommend using the start_requests initial method to pass your headers into the first request for the url https://www.newegg.com/tools/laptop-finder




更多回答

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com