I want to scrape data like screen type and title etc from https://www.newegg.com/tools/laptop-finder
but I am stuck as my script gets crawled but not scraped
我想从https://www.newegg.com/tools/laptop-finder中抓取屏幕类型和标题等数据,但我的脚本被抓取但没有被抓取,我被卡住了
HTML code of the website is
该网站的Html代码为
<tr>
<td class="td-item">
<a class="goods-info" href="https://www.newegg.com/p/N82E16834156430?Item=N82E16834156430" data-toggle="modal" data-target="#modal-pc-builder-pdp">
<div class="goods-img">
<img src="https://c1.neweggimages.com/ProductImageCompressAll125/34-156-430-03.jpg" alt="MSI Katana 15 B12VGK-082US 15.6" Gaming Laptop">
</div>
<div class="goods-title">
<div class="goods-title-content">MSI Katana 15 B12VGK-082US 15.6" Gaming Laptop</div>
<div class="goods-rating">
<i class="rating rating-4" aria-label="rated 4 out of 5"></i>
<span class="goods-rating-num font-s text-gray">(31)</span>
</div>
</div>
</a>
</td>
<td class="td-spec"><div class="hid-text">Screen Size</div><span>15.6"</span></td>
<td class="td-spec"><div class="hid-text">CPU type</div><span>Intel Core i7 12th Gen</span></td>
<td class="td-spec"><div class="hid-text">Memory</div><span>16GB</span></td>
<td class="td-spec"><div class="hid-text">Storage</div><span>1 TB PCIe</span></td>
<td class="td-spec"><div class="hid-text">GPU</div><span>NVIDIA GeForce RTX 4070 Laptop GPU</span></td>
<td class="td-spec"><div class="hid-text">Resolution</div><span>1920 x 1080</span></td>
<td class="td-spec"><div class="hid-text">Weight</div><span>4 - 4.9 lbs.</span></td>
<td class="td-spec"><div class="hid-text">Backlit Keyboard</div><span>Backlit</span></td>
<td class="td-spec"><div class="hid-text">Touchscreen</div><span>No</span></td>
<td class="td-spec"><div class="hid-text">CPU Speed</div><span>12650H (2.30GHz)</span></td>
<td class="td-spec"><div class="hid-text">Number of Cores</div><span>10-core (6P+4E) Processor</span></td>
<td class="td-spec"><div class="hid-text">Color</div><span>Black</span></td>
<td class="td-spec"><div class="hid-text">Display Type</div><span>Full HD</span></td>
<td class="td-spec"><div class="hid-text">Graphic Type</div><span>Dedicated Card</span></td>
<td class="td-spec"><div class="hid-text">Operating System</div><span>Windows 11 Home</span></td>
<td class="td-spec"><div class="hid-text">Webcam</div><span>Yes</span></td>
<td class="td-action">
<div class="item-action grid col-w-3">
<div class="goods-price-current hide-click-for-details">
<div class="goods-price font-s">
<div class="goods-price-current">
<span class="goods-price-label"></span>
<span class="goods-price-symbol">$</span>
<span class="goods-price-value"><strong>1,159</strong><sup>.00</sup></span>
</div>
</div>
</div>
<div class="goods-operate xxs-hide">
<div class="goods-button-area">
<label class="input-check input-check-s compare-check">
<input type="checkbox" autocomplete="off" aria-label="checkbox">
<span class="input-check-mark text-hide">checkmark</span>
<div class="input-check-text">Compare</div>
</label>
<button title="Add MSI Katana 15 B12VGK-082US 15.6" 144 Hz IPS Intel Core i7 12th Gen 12650H (2.30GHz) NVIDIA GeForce RTX 4070 Laptop GPU 16GB Memory 1 TB NVMe SSD Windows 11 Home 64-bit Gaming Laptop to cart" class="button button-s bg-orange">Add to cart</button>
</div>
</div>
</div>
</td>
</tr>
As i am just learning to scrape i scraped only title and screen size for just one laptop
Below is my scrapy code
因为我刚刚学习刮擦,所以我只刮掉了标题和屏幕大小,因为下面是我的刮擦代码
import scrapy
class LaptopSpider(scrapy.Spider):
name = "laptop"
headers = {
"authority": "ssl.doas.state.ga.us",
"pragma": "no-cache",
"cache-control": "no-cache",
"sec-ch-ua": "\"Chromium\";v=\"92\", \" Not A;Brand\";v=\"99\", \"Google Chrome\";v=\"92\"",
"accept": "application/json, text/javascript, */*; q=0.01",
"x-requested-with": "XMLHttpRequest",
"sec-ch-ua-mobile": "?0",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36",
"content-type": "application/x-www-form-urlencoded; charset=UTF-8",
"origin": "https://ssl.doas.state.ga.us",
"sec-fetch-site": "same-origin",
"sec-fetch-mode": "cors",
"sec-fetch-dest": "empty",
"referer": "https://ssl.doas.state.ga.us/gpr/",
"accept-language": "en-US,en;q=0.9"
}
start_urls = ['https://www.newegg.com/tools/laptop-finder']
custom_settings = {'REDIRECT_ENABLED': False}
handle_httpstatus_list = [302]
def parse(self, response):
product = response.css('tr td.td-item')
for item in product:
yield {
'Title': item.css('.goods-title-content::text').get(),
'Screen Size': item.xpath('.//div[text()="Screen Size"]/following-sibling::span/text()').get(),
}
My log file is
我的日志文件是
2023-09-10 10:12:28 [scrapy.core.engine] INFO: Spider opened
2023-09-10 10:12:28 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2023-09-10 10:12:28 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2023-09-10 10:12:29 [scrapy.core.engine] DEBUG: Crawled (302) <GET https://www.newegg.com/tools/laptop-finder> (referer: None)
2023-09-10 10:12:29 [scrapy.core.engine] INFO: Closing spider (finished)
2023-09-10 10:12:29 [scrapy.extensions.feedexport] INFO: Stored json feed (0 items) in: j.json
2023-09-10 10:12:29 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
Help me out
帮帮我
更多回答
我是一名优秀的程序员,十分优秀!