gpt4 book ai didi

python - 如何在 scrapy pipelines.py 文件中导入 django 模型

转载 作者:太空宇宙 更新时间:2023-11-03 11:54:34 26 4
gpt4 key购买 nike

我正在尝试在我的 pipelines.py 中导入一个 django 应用程序的模型,以使用 django orm 保存数据。我在第一个涉及的django应用程序“app1”中创建了一个scrapy项目scrapy_project(顺便说一句,这是一个不错的选择吗?)。我将这些行添加到我的 scrapy 设置文件中:

def setup_django_env(path):
import imp, os
from django.core.management import setup_environ

f, filename, desc = imp.find_module('settings', [path])
project = imp.load_module('settings', f, filename, desc)

setup_environ(project)

current_dir = os.path.abspath(os.path.dirname(os.path.dirname(__file__)))
setup_django_env(os.path.join(current_dir, '../../d_project1'))

当我尝试导入我的 django 应用程序 app1 的模型时,我收到此错误消息:

Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 4, in <module>
execute()
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 122, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 76, in _run_print_help
func(*a, **kw)
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 129, in _run_command
cmd.run(args, opts)
File "/usr/local/lib/python2.7/dist-packages/scrapy/commands/crawl.py", line 43, in run
spider = self.crawler.spiders.create(spname, **opts.spargs)
File "/usr/local/lib/python2.7/dist-packages/scrapy/command.py", line 33, in crawler
self._crawler.configure()
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 41, in configure
self.engine = ExecutionEngine(self, self._spider_closed)
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 63, in __init__
self.scraper = Scraper(crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/scraper.py", line 66, in __init__
self.itemproc = itemproc_cls.from_crawler(crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 50, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 29, in from_settings
mwcls = load_object(clspath)
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py", line 39, in load_object
raise ImportError, "Error loading object '%s': %s" % (path, e)
ImportError: Error loading object 'scrapy_project.pipelines.storage.storage': No module named dydict.models

为什么 scrapy 不能访问 django 应用程序模型(假设 app1 在 installed_app 中)?

最佳答案

在您不导入 django 模型的管道中,您使用绑定(bind)到 django 模型的 scrapy 模型。您必须在 scrapy 设置中添加 Django 设置,而不是之后。

要在 scrapy 项目中使用 django 模型,你必须使用 django_Item https://github.com/scrapy-plugins/scrapy-djangoitem (导入到您的 pythonpath)

我推荐的文件结构是:

Projects
|-DjangoScrapy
|-DjangoProject
| |-Djangoproject
| |-DjangoAPP
|-ScrapyProject
|-ScrapyProject
|-Spiders

然后在您的 scrapy 项目中,您必须将 pythonpath 完整路径添加到 django 项目:

**# Setting up django's project full path.**
import sys
sys.path.insert(0, '/home/PycharmProject/scrap/DjangoProject')

# Setting up django's settings module name.
import os
os.environ['DJANGO_SETTINGS_MODULE'] = 'DjangoProject.settings'

然后在您的 items.py 中,您可以将 Django 模型绑定(bind)到 scrapy 模型:

from DjangoProject.models import Person, Job
from scrapy_djangoitem import DjangoItem

class Person(DjangoItem):
django_model = Person
class Job(DjangoItem):
django_model = Job

然后你可以在对象的 yeld 之后在管道中使用 .save() 方法:

蜘蛛.py

from scrapy.spider import BaseSpider
from mybot.items import PersonItem

class ExampleSpider(BaseSpider):
name = "example"
allowed_domains = ["dmoz.org"]
start_urls = ['http://www.dmoz.org/World/Espa%C3%B1ol/Artes/Artesan%C3%ADa/']

def parse(self, response):
# do stuff
return PersonItem(name='zartch')

pipelines.py

from myapp.models import Person

class MybotPipeline(object):
def process_item(self, item, spider):
obj = Person.objects.get_or_create(name=item['name'])
return obj

我有一个存储库,代码最少:(你只需要在 scrapy 设置中设置你的 django 项目的路径) https://github.com/Zartch/Scrapy-Django-Minimal

在: https://github.com/Zartch/Scrapy-Django-Minimal/blob/master/mybot/mybot/settings.py您必须将我的 Django 项目路径更改为您的 DjangoProject 路径:

sys.path.insert(0, '/home/zartch/PycharmProjects/Scrapy-Django-Minimal/myweb')

关于python - 如何在 scrapy pipelines.py 文件中导入 django 模型,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15321584/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com