gpt4 book ai didi

python - Scrapy--Can not import the items to my spider (没有模块名称behance.items)

转载 作者:行者123 更新时间:2023-11-28 18:34:33 25 4
gpt4 key购买 nike

我是 scrapy 的新手,在运行蜘蛛爬行时 behance

import scrapy
from scrapy.selector import Selector
from behance.items import BehanceItem
from selenium import webdriver
from scrapy.http import TextResponse

from scrapy.crawler import CrawlerProcess

class DmozSpider(scrapy.Spider):
name = "behance"
#allowed_domains = ["behance.com"]
start_urls = [

"https://www.behance.net/gallery/29535305/Mind-Your-Monsters",


]


def __init__ (self):
self.driver = webdriver.Firefox()

def parse(self, response):

self.driver.get(response.url)
response = TextResponse(url=response.url, body=self.driver.page_source, encoding='utf-8')
item = BehanceItem()
hxs = Selector(response)

item['link'] = response.xpath("//div[@class='js-project-module-image-hd project-module module image project-module-image']/@data-hd-src").extract()

yield item

process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(DmozSpider)
process.start()

当我运行我的爬虫时,命令行出现以下错误

追溯(最近的调用最后): 文件“/home/davy/behance/behance/spiders/behance_spider.py”,第 3 行,位于 从 behance.items 导入 BehanceItem

导入错误:没有名为 behance.items 的模块

我的目录结构:

behance/
├── behance
│ ├── __init__.py
│ ├── items.py
│ ├── pipelines.py
│ ├── settings.py
│ └── spiders
│ ├── __init__.py
│ └── behance_spider.py
-── scrapy.cfg

最佳答案

尝试使用以下命令运行您的蜘蛛:

scrapy crawl behance

或者改变你的爬虫文件:

import scrapy
from scrapy.selector import Selector
from behance.items import BehanceItem
from selenium import webdriver
from scrapy.http import TextResponse

from scrapy.crawler import CrawlerProcess

class BehanceSpider(scrapy.Spider):
name = "behance"
allowed_domains = ["behance.com"]
start_urls = [

"https://www.behance.net/gallery/29535305/Mind-Your-Monsters",


]


def __init__ (self):
self.driver = webdriver.Firefox()

def parse(self, response):

self.driver.get(response.url)
response = TextResponse(url=response.url, body=self.driver.page_source, encoding='utf-8')
item = BehanceItem()
hxs = Selector(response)

item['link'] = response.xpath("//div[@class='js-project-module-image-hd project-module module image project-module-image']/@data-hd-src").extract()

yield item

然后在您的 settings.py 文件所在的目录中创建另一个 python 文件。

run.py

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

process = CrawlerProcess(get_project_settings())

process.crawl("behance")
process.start()

现在像运行普通的 python 脚本一样运行这个文件。 python 运行.py

关于python - Scrapy--Can not import the items to my spider (没有模块名称behance.items),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33740113/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com