python - Scrapy 异常 - exceptions.AttributeError : 'unicode' object has no attribute 'select'-6ren

python - Scrapy 异常 - exceptions.AttributeError : 'unicode' object has no attribute 'select'

转载作者：太空狗更新时间：2023-10-30 00:50:51

27

4

我写了一个蜘蛛，但每当我运行这个蜘蛛时，我都会收到这个错误:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/base.py", line 824, in runUntilCurrent
    call.func(*call.args, **call.kw)
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/task.py", line 607, in _tick
    taskObj._oneWorkUnit()
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/task.py", line 484, in _oneWorkUnit
    result = next(self._iterator)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 57, in <genexpr>
    work = (callable(elem, *args, **named) for elem in iterable)
--- <exception caught here> ---
  File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 96, in iter_errback
    yield it.next()
  File "/usr/local/lib/python2.7/dist-packages/scrapy/contrib/spidermiddleware/offsite.py", line 28, in process_spider_output
    for x in result:
  File "/usr/local/lib/python2.7/dist-packages/scrapy/contrib/spidermiddleware/referer.py", line 22, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/usr/local/lib/python2.7/dist-packages/scrapy/contrib/spidermiddleware/urllength.py", line 33, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/usr/local/lib/python2.7/dist-packages/scrapy/contrib/spidermiddleware/depth.py", line 50, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/home/vaibhav/scrapyprog/comparison/eScraperInterface/eScraper/spiders/streetstylestoreSpider.py", line 38, in parse
    item['productURL'] = site.select('.//a/@href').extract()
exceptions.AttributeError: 'unicode' object has no attribute 'select'

我的代码是:

from scrapy.http import Request
from eScraper.items import EscraperItem
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.spiders import CrawlSpider

#------------------------------------------------------------------------------ 

class ESpider(CrawlSpider):

    name = "streetstylestoreSpider"
    allowed_domains = ["streetstylestore.com"]    

    start_urls = [
                  "http://streetstylestore.com/index.php?id_category=16&controller=category",
                  "http://streetstylestore.com/index.php?id_category=46&controller=category",
                  "http://streetstylestore.com/index.php?id_category=51&controller=category",
                  "http://streetstylestore.com/index.php?id_category=61&controller=category",
                  "http://streetstylestore.com/index.php?id_category=4&controller=category"
                  ]


    def parse(self, response):                  

        items = []
        hxs = HtmlXPathSelector(response)        
        sites = hxs.select('//ul[@id="product_list"]/li').extract()       

        for site in sites:

            item = EscraperItem()        
            item['currency'] = 'INR'
            item['productSite'] = ["http://streetstylestore.com"]
            item['productURL'] = site.select('.//a/@href').extract()            
            item['productImage'] = site.select('.//a/img/@src').extract()                    
            item['productTitle'] = site.select('.//a/@title').extract()            
            productMRP = [i.strip().split('Rs')[-1].replace(',','') for i in hxs.select('.//div[@class="price_container"]//span[@class="old_price"]/text()').extract()]
            productPrice = [i.strip().split('Rs')[-1].replace(',','') for i in hxs.select('.//div[@class="price_container"]//p[@class="price"]/text()').extract()]
            item['productPrice'] = productMRP + productPrice                       

            items.append(item)
            secondURL = item['productURL'][0]
            request = Request(secondURL,callback=self.parsePage2)
            request.meta['item'] = item
            yield request


    def parsePage2(self, response):

        temp = []                
        item = response.meta['item']
        hxs = HtmlXPathSelector(response)

        availability =  [i for i in hxs.select('//div[@class="details"]/p/text()').extract() if 'In Stock ' in i]

        if  availability:
            item['availability'] = True
        else:
            item['availability'] = False

        hasVariants =  hxs.select('//div[@class="attribute_list"]').extract()

        if hasVariants:            
            item['hasVariants'] = True
        else:
            item['hasVariants'] = False

        category = hxs.select('//div[@class="breadcrumb"]/a/text()').extract()
        if category:
            productCategory = [category[0]]
            if len(category) >= 1:
                productSubCategory = [category[1]]
            else:
                productSubCategory = ['']
        else:            
            productCategory = ['']
            productSubCategory = ['']

        item['productCategory'] = productCategory       
        item['productSubCategory'] = productSubCategory

        for i in hxs.select('//div[@id="thumbs_list"]/ul/li/a/img/@src').extract():
            temp.append(i.replace("medium","large"))

        item['productDesc'] =  " ".join([i for i in hxs.select('//div[@id="short_description_content"]/p/text()').extract()])
        item['productImage'] = item['productImage'] + hxs.select('//div[@id="thumbs_list"]/ul/li/a/img/@src').extract() + hxs.select('//div[@id="thumbs_list"]/ul/li/a/@href').extract() + temp   
        item['image_urls'] = list(set(item['productImage']))        

        return item

谁能告诉我我的代码有什么问题...

最佳答案

不要对存储在 sites 中的内容调用 .extract() - extract() 会返回文本，但您不会还想要它的文字。这...

sites = hxs.select('//ul[@id="product_list"]/li').extract()

...应该是这样的:

sites = hxs.select('//ul[@id="product_list"]/li')

关于python - Scrapy 异常 - exceptions.AttributeError : 'unicode' object has no attribute 'select' ，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/17268175/

27

4

0

文章推荐： python - “import sitecustomize”在启动 spyder 时失败

文章推荐： python nltk 为 wordnet 相似性度量返回奇数结果

文章推荐： python - 如何查找excel单元格是否为日期

c# - if((attributes and File Attributes.Hidden) == File Attributes.Hidden) { } 如何工作？
关于 this页面，我看到以下代码: if ((attributes & FileAttributes.Hidden) == FileAttributes.Hidden) 但我不明白为什么会变成这样。
attributes - pthread互斥锁的 “attribute”是什么？
函数pthread_mutex_init允许您指定指向属性的指针。但是我还没有找到关于pthread属性是什么的很好的解释。我一直只是提供NULL。这个论点有用吗？该文档，对于那些忘记它的人: PT
xml - 我怎样才能结合xsl :attribute and xsl:use-attribute-sets to conditionally use an attribute set?
我们有一个 xml 节点“item”，其属性为“style”，即“Header1”。但是，这种风格可以改变。我们有一个名为 Header1 的属性集，它定义了它在 PDF 中的外观，通过 xsl:fo
JavaScript: element.setAttribute(attribute,value) , element.attribute=value & element.[attribute]=value 不改变属性值
我的任务是在用户点击它时从输入框中删除占位符并使标签可见。如果用户未在其中再次填写任何内容，请放回占位符并使标签不可见。我可以隐藏它但不能重新分配它。我试过 element.setAttribute
attributes - ASP.NET 5 : Bind attribute with Include parameter - include is not a valid named attribute argument
我从文章中编写代码，并且有: public IActionResult Create([Bind(Include="Imie,Nazwisko,Stanowisko,Wiek")] Pracownik
attributes - 单点触控 : Understand Foundation Attributes
你能给我解释一下以下属性吗？ 1) [MonoTouch.Foundation.Register("SomeClass")] 这个属性是否只用于向IB注册类？以编程方式扩展 iOS 类时是否必须使用此
c++ - this.attribute 应该是 this->attribute 是什么意思
我正在编写一个 C++ 程序，在调试时我在执行以下函数: int CClass::do_something() { ... // I've put a breakpoint here } 我的 C
javascript - polymer 1.0 : Is there any way to use 'layout' as an attribute instead of as a CSS class or using Attribute serialization in the class attribute?
我已经在 polymer 0.5 中构建了我的应用程序。现在我已经将它更新到 polymer 1.0。对于响应式布局，我使用了一个布局属性，它使用 Polymer 0.5 中布局属性的自定义逻辑。
attributes - Jade : element attributes without value
我是使用 Jade 的新手——到目前为止它很棒。但是我需要发生的一件事是具有“itemscope”属性的元素: 我的 Jade 符是: header(itemscope, itemtype='ht
attributes - 为什么在 Chef 中使用普通属性(attribute.set[..])？
我正在研究一个厨师实现，有时在过去的地方使用了 attribute.set，attribute.default 会这样做。为了解决这个问题，我对 Chef 属性优先范式非常熟悉。我知道“正常”属性(使
HTML "data-attribute"与简单 "custom attribute"
我经常看到html data-attribute (s) 将特定值/参数添加到 html 元素，例如使用它们将按钮“链接”到要打开的模式对话框等的 Bootstrap。现在，我看到一个几乎著名的
ruby - self.attribute 与 @attribute 的优势？
假设如下: def create_new_salt self.salt = self.object_id.to_s + rand.to_s end 为什么使用“ self ”更好。而不是实例变量“
主干.js 访问模型中的模型属性 - this.attribute VS this.get ('attribute' )？
根据我的理解，Backbone.js 模型的属性应该通过以下方式声明为有点私有(private)的成员变量 this.set({ attributeName: attributeValue }) //
xml - 在Hive XML SerDe中使用 “Attribute to Attribute”映射
我有一个看起来像下面的XML文档: ... ... ... ...
JSF 复合 :attribute with f:attribute conversion error
我正在实现一个 JSF 组件，需要有条件地添加一些属性。这个问题类似于之前的 JSF: p:dataTable with f:attribute results in "argument type m
安卓市场发布: 'android:icon' attribute: attribute is not a string value
我正在尝试将应用程序发布到 Android 电子市场，但出现以下错误: W/ResourceType(16964): No known package when getting value for r
c++ - 玛雅编程 : Separating attributes into sections in the attribute editor
抱歉这么具体的应用程序，但我注意到另一篇关于 Maya 开发的回答很好的帖子。我刚刚为 Maya 编写了一个插件节点。它只是根据湍流函数杀死一堆粒子。湍流由许多可在属性编辑器中调整的属性驱动。在属
html - html元素中data-attribute=false与data-attribute ="false"有什么区别吗？
我在 html 元素中的数据属性为 Update .它具有数据属性的 bool 值。跟下面的元素Update有什么区别吗？因为数据属性用双引号引起来。 html是否支持 bool 值？最佳答案 b
c# - 错误 : "is not an attribute class" when using ConfigurationElementType attribute
我正在尝试为企业库 5.0 的异常处理 block 创建自定义异常处理程序。据我了解，我需要使用属性开始上课“[ConfigurationElementType(typeof(CustomHandle
css - [attribute~=value] 和 [attribute*=value] 的区别
我找不到这两个选择器之间的区别。两者似乎都做同样的事情，即根据包含给定字符串的特定属性值选择标签。对于 [attribute~=value] :http://www.w3schools.com/cs

首页

博学

6Ren·AI

商城

python - Scrapy 异常 - exceptions.AttributeError : 'unicode' object has no attribute 'select'