gpt4 book ai didi

python - scrapy 不在异常时打印出堆栈跟踪

转载 作者:行者123 更新时间:2023-11-28 18:34:34 25 4
gpt4 key购买 nike

是否有一种特殊的机制可以强制scrapy打印出所有python异常/堆栈跟踪。

我犯了一个简单的错误,错误地获取了一个列表属性,导致 AttributeError 没有在日志中完整显示出现的是:

2015-11-15 22:13:50 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 264,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 40342,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2015, 11, 15, 22, 13, 50, 860480),
'log_count/CRITICAL': 1,
'log_count/DEBUG': 1,
'log_count/INFO': 1,
'response_received_count': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'spider_exceptions/AttributeError': 1,
'start_time': datetime.datetime(2015, 11, 15, 22, 13, 49, 222371)}

所以它显示 AttributeError 计数为 1,但没有告诉我在哪里以及如何,我不得不手动将 ipdb.set_trace() 放在代码中以找出错误的位置。 Scrapy 自己继续执行其他线程而不打印任何内容

ipdb>
AttributeError: "'list' object has no attribute 'match'"
> /Users/username/Programming/regent/regentscraper/spiders/regent_spider.py(139)request_listing_detail_pages_from_listing_id_list()
138 volatile_props = ListingScanVolatilePropertiesItem()
--> 139 volatile_props['position_in_search'] = list_of_listing_ids.match(listing_id) + rank_of_first_item_in_page
140

垃圾设置

# -*- coding: utf-8 -*-

# Scrapy settings for regentscraper project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
# http://doc.scrapy.org/en/latest/topics/settings.html
# http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html
# http://scrapy.readthedocs.org/en/latest/topics/spider-middleware.html

import sys
import os
import django
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__name__), os.pardir)))

print sys.path

os.environ['DJANGO_SETTINGS_MODULE'] = 'regent.settings'
django.setup() #new for Django 1.8



BOT_NAME = 'regentscraper'

SPIDER_MODULES = ['regentscraper.spiders']
NEWSPIDER_MODULE = 'regentscraper.spiders'


ITEM_PIPELINES = {
'regentscraper.pipelines.ListingScanPipeline': 300,
}

最佳答案

我遇到了与上述相同的事件。以下版本在我的环境中使用:

  • Django (1.11.4)
  • 抓取(1.4.0)
  • scrapy-djangoitem (1.1.1)

我通过在 scrapy 中加载的 dnango 设置中添加“LOGGING_CONFIG = None”解决了这个问题。我创建了一个新的 django 设置文件作为 settings_scrapy 具有以下内容:

mysite.settings_scrapy

try:
from mysite.settings import *
LOGGING_CONFIG = None
except ImportError:
pass

然后,设置文件被加载到scrapy的设置文件中:

import sys
import os
import django
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings_scrapy'
django.setup()

在那之后,出现了蜘蛛和管道异常的堆栈跟踪。

引用

https://docs.djangoproject.com/en/1.11/topics/logging/#disabling-logging-configuration

关于python - scrapy 不在异常时打印出堆栈跟踪,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33725800/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com