gpt4 book ai didi

python - Docker Scrapy spider 保存数据到 Postgres 端口报错

转载 作者:行者123 更新时间:2023-11-29 13:13:38 24 4
gpt4 key购买 nike

我正在尝试使用我的 scrapy 蜘蛛在 VPS 服务器上运行。所以我使用了 Docker 图像并将其与 PostgreSQL、Scrapy、scrapy-splash 图像附加在一起。当我使用 docker-compose up 启动蜘蛛时,我遇到了端口错误,蜘蛛似乎无法识别我的 pipelines.py 中的 self.cur

当我在本地电脑上运行蜘蛛时,它运行良好,没有遇到端口或 pipelines.py 中的任何错误。

VPS 服务器错误:

2018-08-08 02:19:10 [scrapy.middleware] INFO: Enabled spider midd                                                                                        lewares:
web_1 | ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
web_1 | 'scrapy_splash.SplashDeduplicateArgsMiddleware',
web_1 | 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
web_1 | 'tutorial.middlewares.TutorialSpiderMiddleware',
web_1 | 'scrapy.spidermiddlewares.referer.RefererMiddleware',
web_1 | 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
web_1 | 'scrapy.spidermiddlewares.depth.DepthMiddleware']
web_1 | 2018-08-08 02:19:10 [scrapy.middleware] INFO: Enabled item pipeli nes:
web_1 | ['tutorial.pipelines.TutorialPipeline']
web_1 | 2018-08-08 02:19:10 [scrapy.core.engine] INFO: Spider opened
web_1 | 2018-08-08 02:19:10 [scrapy.core.engine] INFO: Closing spider (sh utdown)
web_1 | 2018-08-08 02:19:10 [scrapy.core.engine] ERROR: Scraper close fai lure
web_1 | Traceback (most recent call last):
web_1 | File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py" , line 82, in crawl
web_1 | yield self.engine.open_spider(self.spider, start_requests)
web_1 | psycopg2.OperationalError: could not connect to server: Connectio n refused
web_1 | Is the server running on host "localhost" (127.0.0.1) and accept ing
web_1 | TCP/IP connections on port 5432?
web_1 | could not connect to server: Cannot assign requested address
web_1 | Is the server running on host "localhost" (::1) and accepting
web_1 | TCP/IP connections on port 5432?
web_1 |
web_1 |
web_1 | During handling of the above exception, another exception occurre d:
web_1 |
web_1 | Traceback (most recent call last):
web_1 | File "/usr/local/lib/python3.6/site-packages/twisted/internet/d efer.py", line 654, in _runCallbacks
web_1 | current.result = callback(current.result, *args, **kw)
web_1 | File "/scrapy_estate/tutorial/pipelines.py", line 19, in cl ose_spider
web_1 | self.cur.close()
web_1 | AttributeError: 'TutorialPipeline' object has no attribute 'cur'
web_1 | 2018-08-08 02:19:10 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
web_1 | {'finish_reason': 'shutdown',
web_1 | 'finish_time': datetime.datetime(2018, 8, 8, 2, 19, 10, 744998),
web_1 | 'log_count/ERROR': 1,
web_1 | 'log_count/INFO': 6}
web_1 | 2018-08-08 02:19:10 [scrapy.core.engine] INFO: Spider closed (shu tdown)
web_1 | Unhandled error in Deferred:
web_1 | 2018-08-08 02:19:10 [twisted] CRITICAL: Unhandled error in Deferr ed:
web_1 |
web_1 | 2018-08-08 02:19:10 [twisted] CRITICAL:
web_1 | Traceback (most recent call last):
web_1 | File "/usr/local/lib/python3.6/site-packages/twisted/internet/d efer.py", line 1418, in _inlineCallbacks
web_1 | result = g.send(result)
web_1 | File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py" , line 82, in crawl
web_1 | yield self.engine.open_spider(self.spider, start_requests)
web_1 | psycopg2.OperationalError: could not connect to server: Connectio n refused
web_1 | Is the server running on host "localhost" (127.0.0.1) and accept ing
web_1 | TCP/IP connections on port 5432?
web_1 | could not connect to server: Cannot assign requested address
web_1 | Is the server running on host "localhost" (::1) and accepting
web_1 | TCP/IP connections on port 5432?

我的 Dockerfile

FROM ubuntu:18.04
FROM python:3.6-onbuild
RUN apt-get update &&apt-get upgrade -y&& apt-get install python-pip -y && pip3 install psycopg2 && pip3 install psycopg2-binary
RUN pip3 install --upgrade pip
RUN pip3 install scrapy --upgrade
run pip3 install scrapy-splash
COPY . /scrapy_estate
WORKDIR /scrapy_estate
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
EXPOSE 80
EXPOSE 5432/tcp
CMD scrapy crawl estate

Docker-compose.yml:

version: "3"
services:
interface:
links:
- postgres:postgres
image: adminer
ports:
- "8080:8080"
networks:
- webnet
postgres:
image: postgres
container_name: postgres
environment:
POSTGRES_USER: 'postgres'
POSTGRES_PASSWORD: '123'
volumes:
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
networks:
- webnet

web:
image: user/scrapy_estate:latest
build: ./tutorial
ports:
- "8081:8081"
networks:
- webnet
environment:
DB_HOST: postgres
networks:
- webnet
splash:
image: scrapinghub/splash
ports:
- "8050:8050"
expose:
- "8050"
networks:
webnet:

我的管道.py

import psycopg2
class TutorialPipeline(object):
def open_spider(self, spider):
hostname = 'localhost'
username = 'postgres'
password = '123' # your password
database = 'real_estate'
self.connection = psycopg2.connect(host=hostname, user=username, password=password, dbname=database)
self.cur = self.connection.cursor()

def close_spider(self, spider):
self.cur.close()
self.connection.close()

def process_item(self, item, spider):
self.cur.execute("insert into estate(estate_title,estate_address,estate_area,estate_description,estate_price,estate_type,estate_tag,estate_date,estate_seller_name,estate_seller_address,estate_seller_phone,estate_seller_mobile,estate_seller_email) values(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)",(item['estate_title'],item['estate_address'],item['estate_area'],item['estate_description'],item['estate_price'],item['estate_type'],item['estate_tag'],item['estate_date'],item['estate_seller_name'],item['estate_seller_address'],item['estate_seller_phone'],item['estate_seller_mobile'],item['estate_seller_email']))
self.connection.commit()
return item

编辑

蜘蛛现在可以工作了,因为我没有在 docker-compose 中公开端口 5432,并且我的 VPS 上已经安装了一个 postgreSQL,所以该端口已经被使用,所以我杀死了 VPS 上的端口 5432,再次运行它并工作。

最佳答案

因为容器的ip地址网关是:172.17.0.1

因此,您应该在 pipelines.py 文件中将 hostname = 'localhost' 更改为 hostname = '172.17.0.1'。再次运行。

将端口添加到 postgres 容器的 dockerfile 中:

postgres:
image: postgres
container_name: postgres
environment:
POSTGRES_USER: 'postgres'
POSTGRES_PASSWORD: '123'
volumes:
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
ports:
- "5432:5432"
expose:
- "5432"
networks:
- webnet

关于python - Docker Scrapy spider 保存数据到 Postgres 端口报错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51737912/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com