Getting : Crawled (302) while scraping laptop data using Scrapy(获取：已爬行(302)，同时使用Scrapy擦除笔记本电脑数据)-6ren

Getting : Crawled (302) while scraping laptop data using Scrapy(获取：已爬行(302)，同时使用Scrapy擦除笔记本电脑数据)

转载作者：bug小助手更新时间：2023-10-25 23:00:09

I want to scrape data like screen type and title etc from https://www.newegg.com/tools/laptop-finder
but I am stuck as my script gets crawled but not scraped

我想从https://www.newegg.com/tools/laptop-finder中抓取屏幕类型和标题等数据，但我的脚本被抓取但没有被抓取，我被卡住了

HTML code of the website is

该网站的Html代码为

<tr>
    <td class="td-item">
        <a class="goods-info" href="https://www.newegg.com/p/N82E16834156430?Item=N82E16834156430" data-toggle="modal" data-target="#modal-pc-builder-pdp">
            <div class="goods-img">
                <img src="https://c1.neweggimages.com/ProductImageCompressAll125/34-156-430-03.jpg" alt="MSI Katana 15 B12VGK-082US 15.6&quot; Gaming Laptop">
            </div>
            <div class="goods-title">
                <div class="goods-title-content">MSI Katana 15 B12VGK-082US 15.6" Gaming Laptop</div>
                <div class="goods-rating">
                    <i class="rating rating-4" aria-label="rated 4 out of 5"></i>
                    <span class="goods-rating-num font-s text-gray">(31)</span>
                </div>
            </div>
        </a>
    </td>
    <td class="td-spec"><div class="hid-text">Screen Size</div><span>15.6"</span></td>
    <td class="td-spec"><div class="hid-text">CPU type</div><span>Intel Core i7 12th Gen</span></td>
    <td class="td-spec"><div class="hid-text">Memory</div><span>16GB</span></td>
    <td class="td-spec"><div class="hid-text">Storage</div><span>1 TB PCIe</span></td>
    <td class="td-spec"><div class="hid-text">GPU</div><span>NVIDIA GeForce RTX 4070 Laptop GPU</span></td>
    <td class="td-spec"><div class="hid-text">Resolution</div><span>1920 x 1080</span></td>
    <td class="td-spec"><div class="hid-text">Weight</div><span>4 - 4.9 lbs.</span></td>
    <td class="td-spec"><div class="hid-text">Backlit Keyboard</div><span>Backlit</span></td>
    <td class="td-spec"><div class="hid-text">Touchscreen</div><span>No</span></td>
    <td class="td-spec"><div class="hid-text">CPU Speed</div><span>12650H (2.30GHz)</span></td>
    <td class="td-spec"><div class="hid-text">Number of Cores</div><span>10-core (6P+4E) Processor</span></td>
    <td class="td-spec"><div class="hid-text">Color</div><span>Black</span></td>
    <td class="td-spec"><div class="hid-text">Display Type</div><span>Full HD</span></td>
    <td class="td-spec"><div class="hid-text">Graphic Type</div><span>Dedicated Card</span></td>
    <td class="td-spec"><div class="hid-text">Operating System</div><span>Windows 11 Home</span></td>
    <td class="td-spec"><div class="hid-text">Webcam</div><span>Yes</span></td>
    <td class="td-action">
        <div class="item-action grid col-w-3">
            <div class="goods-price-current hide-click-for-details">
                <div class="goods-price font-s">
                    <div class="goods-price-current">
                        <span class="goods-price-label"></span>
                        <span class="goods-price-symbol">$</span>
                        <span class="goods-price-value"><strong>1,159</strong><sup>.00</sup></span>
                    </div>
                </div>
            </div>
            <div class="goods-operate xxs-hide">
                <div class="goods-button-area">
                    <label class="input-check input-check-s compare-check">
                        <input type="checkbox" autocomplete="off" aria-label="checkbox">
                        <span class="input-check-mark text-hide">checkmark</span>
                        <div class="input-check-text">Compare</div>
                    </label>
                    <button title="Add MSI Katana 15 B12VGK-082US 15.6&quot; 144 Hz IPS Intel Core i7 12th Gen 12650H (2.30GHz) NVIDIA GeForce RTX 4070 Laptop GPU 16GB Memory 1 TB NVMe SSD Windows 11 Home 64-bit Gaming Laptop to cart" class="button button-s bg-orange">Add to cart</button>
                </div>
            </div>
        </div>
    </td>
</tr>

As i am just learning to scrape i scraped only title and screen size for just one laptop
Below is my scrapy code

因为我刚刚学习刮擦，所以我只刮掉了标题和屏幕大小，因为下面是我的刮擦代码

import scrapy

class LaptopSpider(scrapy.Spider):

    name = "laptop"
    headers = {
        "authority": "ssl.doas.state.ga.us",
        "pragma": "no-cache",
        "cache-control": "no-cache",
        "sec-ch-ua": "\"Chromium\";v=\"92\", \" Not A;Brand\";v=\"99\", \"Google Chrome\";v=\"92\"",
        "accept": "application/json, text/javascript, */*; q=0.01",
        "x-requested-with": "XMLHttpRequest",
        "sec-ch-ua-mobile": "?0",
        "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36",
        "content-type": "application/x-www-form-urlencoded; charset=UTF-8",
        "origin": "https://ssl.doas.state.ga.us",
        "sec-fetch-site": "same-origin",
        "sec-fetch-mode": "cors",
        "sec-fetch-dest": "empty",
        "referer": "https://ssl.doas.state.ga.us/gpr/",
        "accept-language": "en-US,en;q=0.9"
    }
    start_urls = ['https://www.newegg.com/tools/laptop-finder']
    custom_settings = {'REDIRECT_ENABLED': False}
    handle_httpstatus_list = [302]

    def parse(self, response):
        product = response.css('tr td.td-item')

        for item in product:
            yield {
                'Title': item.css('.goods-title-content::text').get(),
                'Screen Size': item.xpath('.//div[text()="Screen Size"]/following-sibling::span/text()').get(),
            }

My log file is

我的日志文件是

2023-09-10 10:12:28 [scrapy.core.engine] INFO: Spider opened
2023-09-10 10:12:28 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2023-09-10 10:12:28 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2023-09-10 10:12:29 [scrapy.core.engine] DEBUG: Crawled (302) <GET https://www.newegg.com/tools/laptop-finder> (referer: None)
2023-09-10 10:12:29 [scrapy.core.engine] INFO: Closing spider (finished)
2023-09-10 10:12:29 [scrapy.extensions.feedexport] INFO: Stored json feed (0 items) in: j.json
2023-09-10 10:12:29 [scrapy.statscollectors] INFO: Dumping Scrapy stats:

Help me out

帮帮我

更多回答

Does this answer your question? how to handle 302 redirect in scrapy

这回答了你的问题吗？如何在SCRIPY中处理302重定向

优秀答案推荐

It is impossible to give a clear, direct answer to this question.

这个问题不可能给出一个明确、直接的答案。

You need to take this knowledge as a base:

你需要以这些知识为基础：

302 status means redirect. Typically this can be used by the server to set cookies. Therefore, there is no need to disable redirects and enable 302 response processing.

You can use ways to debug your code https://docs.scrapy.org/en/latest/topics/debug.html

I would recommend using the start_requests initial method to pass your headers into the first request for the url https://www.newegg.com/tools/laptop-finder

更多回答

sql - 当使用 MySQL 的匹配...针对全文搜索执行 "laptop"搜索时，如何匹配 "laptops"和 "laptop"？
我正在尝试在我的网站中构建搜索功能。搜索是该站点的主要目的，我在使用 MySQL 搜索功能时遇到了一些问题。我希望能够针对多个文本字段使用单个搜索字符串进行搜索。我正在使用 MySQL 的 mat
MySQL 连接失败 "user@LAPTOP..."(使用密码 : NO)
我是 MySQL 的绝对初学者，正在尝试建立与我的 Visual Basic 程序的连接。我使用 XAMPP 并创建了一个带有密码的新用户(admin)。当我尝试建立连接时，它失败了，而不是使用正确的
Python 执行速度 : laptop vs desktop
我正在运行一个进行简单数据处理的程序: 解析文本填充字典根据结果数据计算一些函数该程序仅使用 CPU、RAM 和 HDD: 从 Windows 命令行运行输入/输出到本地硬盘屏幕上不显示或打
git 工作流 laptop-desktop-github
我想要以下 git 工作流程， github / \ laptop-----desktop 想法是台式机与笔记本电脑保持同步，并且无论何时从任何一台计算
iPhone/安卓 : How to Send Keystrokes To Laptop Over Wifi?
我怎样才能最好地实现一个系统，通过 iPhone 和/或 Droid 将击键/命令通过 WiFi 或蓝牙发送到台式机或笔记本电脑？有 VLC、Keynote 和其他应用程序的应用程序可以执行此操作，所
html - Bootstrap 错误 : Working on Mobile/Laptop but not desktop
我不熟悉前端开发和修改登录页面上的现有表单(添加额外的列)。当我添加此列时，它在所有设备(台式机除外)上都可以正常呈现。如果能指出可能的原因以及如何解决这个问题，我将不胜感激。下面是我的代码的副本
c++ - MFC : Font too large on PC but ok on laptop
我正在创建一个比平常更大的 CFont : font1.CreateFont(54, 0, 0, 0, FW_HEAVY, false, false, 0, ANSI_CHARSET, OUT_DEF
python - 导入错误 : No module named git after reformatting laptop
我的笔记本电脑已经格式化并安装了新的操作系统，从那以后我得到这个错误:ImportError: 没有名为 git 的模块这是指简单导入 git 的 python 代码。我的笔记本格式化前git的位
css - Bootstrap : How to deal with iMac vs typical laptop sizes?
所以我目前正在开发一个管理仪表板，我在办公室的工作站是 iMac(我认为是 2,500px x 1,5000~)，在家里我使用分辨率为 1366 x 768 的典型笔记本电脑。我的问题是这两种分辨率
Android 谷歌地图 API v2 : Authorization failure at different laptops
我已按照步骤 here 进行操作和 here在我们与 svn 同步的团队项目中创建一个 MapView。执行这些步骤后，MapView 运行顺畅，但仅限于我的笔记本电脑。我的项目成员总是得到 02-0
R 方法论 : Control and I/O between laptop R and large computational servers
这是一个关于 R 的一般方法论问题，目的是: 在远程计算平台上为各种密集建模任务设置和启 Action 业，然后从这些远程计算服务器获取数据，以及然后做分析。 R 肯定能胜任这个任务，我相信这是许
VM : Can not access zookeeper after coming out of sleep on a laptop 上的 Hbase
将笔记本电脑从 sleep 或休眠状态唤醒后，VM 上出现以下错误 ERROR: Can't get master address from ZooKeeper; znode data == null
c# - 使用窗口服务 C# 通过 SMS 远程登录到 Windows pc/laptop
我手头有一个要求，即当用户将短信命令发送到连接到 pc/笔记本电脑的调制解调器时，让用户登录到他/她的 pc。现在我还没有编写任何窗口服务代码，我仍在探索和计划如何去做，但我有一个 winform 应
git - 使用 Git/Github 在 PC/Laptop/WebInterface 之间同步我的项目
我真的是 GIT 的新手。我的目标是在 PC/笔记本电脑之间同步我的 Eclipse 项目 - 我认为为此使用 GIT 是个好主意。所以我已经设置了我的存储库，将其放入 Github 等。我写了一个
objective-c - iOS 模拟器 : how to do a 2 finger single tap on a mac laptop?
在iOS模拟器中，如何在mac笔记本电脑上模拟2指单击？最佳答案在按住选项 (alt) 键的同时单击。关于objective-c - iOS 模拟器 : how to do a 2 finger
Getting : Crawled (302) while scraping laptop data using Scrapy(获取：已爬行(302)，同时使用Scrapy擦除笔记本电脑数据)
我想从https://www.newegg.com/tools/laptop-finder中抓取屏幕类型和标题等数据，但我的脚本被抓取但没有被抓取，我被卡住了。该网站的Html代码为。因为我刚刚学习刮
c# - 使用 Targus Laptop Dock 的 Visual Studio Update 3 中的 System.Runtime.Remoting.RemotingException
在互联网上搜索了几个小时，找不到解决方案；希望这里有人可以提供帮助。当我使用第二个显示器(通过 targus 笔记本电脑扩展坞使用 displaylink)时，我在 VS2015 中加载 Desig

bug小助手

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

Getting : Crawled (302) while scraping laptop data using Scrapy(获取：已爬行(302)，同时使用Scrapy擦除笔记本电脑数据)