gpt4 book ai didi

python - Scrapy 使用 crawlerprocess 运行时抛出错误

转载 作者:行者123 更新时间:2023-12-01 15:20:01 25 4
gpt4 key购买 nike

我在 python 中编写了一个脚本,使用 scrapy 从网站收集不同帖子的名称及其链接。当我从命令行执行我的脚本时,它可以完美运行。现在,我的意图是使用 CrawlerProcess() 运行脚本.我在不同的地方寻找类似的问题,但我找不到任何直接的解决方案或任何更接近的解决方案。但是,当我尝试按原样运行它时,出现以下错误:

from stackoverflow.items import StackoverflowItem ModuleNotFoundError: No module named 'stackoverflow'



到目前为止,这是我的脚本( stackoverflowspider.py):
from scrapy.crawler import CrawlerProcess
from stackoverflow.items import StackoverflowItem
from scrapy import Selector
import scrapy

class stackoverflowspider(scrapy.Spider):
name = 'stackoverflow'
start_urls = ['https://stackoverflow.com/questions/tagged/web-scraping']

def parse(self,response):
sel = Selector(response)
items = []
for link in sel.xpath("//*[@class='question-hyperlink']"):
item = StackoverflowItem()
item['name'] = link.xpath('.//text()').extract_first()
item['url'] = link.xpath('.//@href').extract_first()
items.append(item)
return items

if __name__ == "__main__":
c = CrawlerProcess({
'USER_AGENT': 'Mozilla/5.0',
})
c.crawl(stackoverflowspider)
c.start()
items.py包括:
import scrapy

class StackoverflowItem(scrapy.Item):
name = scrapy.Field()
url = scrapy.Field()

这是树:
Click to see the hierarchy

I know I can bring up success this way but I am only interested to accomplish the task with the way I tried above:


def parse(self,response):
for link in sel.xpath("//*[@class='question-hyperlink']"):
name = link.xpath('.//text()').extract_first()
url = link.xpath('.//@href').extract_first()
yield {"Name":name,"Link":url}

最佳答案

尽管@Dan-Dev 向我展示了正确方向的方法,但我决定提供一个完美无瑕的完整解决方案。
除了我在下面粘贴的内容之外,什么都没有改变:

import sys
#The following line (which leads to the folder containing "scrapy.cfg") fixed the problem
sys.path.append(r'C:\Users\WCS\Desktop\stackoverflow')
from scrapy.crawler import CrawlerProcess
from stackoverflow.items import StackoverflowItem
from scrapy import Selector
import scrapy


class stackoverflowspider(scrapy.Spider):
name = 'stackoverflow'
start_urls = ['https://stackoverflow.com/questions/tagged/web-scraping']

def parse(self,response):
sel = Selector(response)
items = []
for link in sel.xpath("//*[@class='question-hyperlink']"):
item = StackoverflowItem()
item['name'] = link.xpath('.//text()').extract_first()
item['url'] = link.xpath('.//@href').extract_first()
items.append(item)
return items

if __name__ == "__main__":
c = CrawlerProcess({
'USER_AGENT': 'Mozilla/5.0',
})
c.crawl(stackoverflowspider)
c.start()
再一次,在脚本中包含以下内容修复了问题
import sys
#The following line (which leads to the folder containing "scrapy.cfg") fixed the problem
sys.path.append(r'C:\Users\WCS\Desktop\stackoverflow')

关于python - Scrapy 使用 crawlerprocess 运行时抛出错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53033791/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com