gpt4 book ai didi

mysql - Scrapy Pipeline 无法插入 MySQL

转载 作者:行者123 更新时间:2023-11-29 10:51:31 25 4
gpt4 key购买 nike

我正在尝试使用 Scrapy 为大学项目构建一个小型应用程序。蜘蛛正在抓取项目,但我的管道没有将数据插入 mysql 数据库。为了测试管道是否不起作用或 pymysl 实现是否不起作用我编写了一个测试脚本:

代码开始

#!/usr/bin/python3

import pymysql

str1 = "hey"
str2 = "there"
str3 = "little"
str4 = "script"

db = pymysql.connect("localhost","root","**********","stromtarife" )

cursor = db.cursor()

cursor.execute("SELECT * FROM vattenfall")
cursor.execute("INSERT INTO vattenfall (tarif, sofortbonus, treuebonus, jahrespreis) VALUES (%s, %s, %s, %s)", (str1, str2, str3, str4))
cursor.execute("SELECT * FROM vattenfall")
data = cursor.fetchone()
print(data)
db.commit()
cursor.close()

db.close()

代码结束

运行此脚本后,我的数据库有一条新记录,因此它不是我的 pymysql.connect() 函数,该函数已损坏。

我将提供我的 scrapy 代码:

vattenfall_form.py

# -*- coding: utf-8 -*-
import scrapy
from scrapy.crawler import CrawlerProcess
from stromtarife.items import StromtarifeItem

from scrapy.http import FormRequest

class VattenfallEasy24KemptenV1500Spider(scrapy.Spider):
name = 'vattenfall-easy24-v1500-p87435'

def start_requests(self):
return [
FormRequest(
"https://www.vattenfall.de/de/stromtarife.htm",
formdata={"place": "87435", "zipCode": "87435", "cityName": "Kempten",
"electricity_consumptionprivate": "1500", "street": "", "hno": ""},
callback=self.parse
),
]

def parse(self, response):
item = StromtarifeItem()
item['jahrespreis'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[3]/td[2]/text()').extract_first()
item['treuebonus'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[2]/td/strong/text()').extract_first()
item['sofortbonus'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[1]/td/strong/text()').extract_first()
item['tarif'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[1]/h2/span/text()').extract_first()
yield item



class VattenfallEasy24KemptenV2500Spider(scrapy.Spider):
name = 'vattenfall-easy24-v2500-p87435'

def start_requests(self):
return [
FormRequest(
"https://www.vattenfall.de/de/stromtarife.htm",
formdata={"place": "87435", "zipCode": "87435", "cityName": "Kempten",
"electricity_consumptionprivate": "2500", "street": "", "hno": ""},
callback=self.parse
),
]

def parse(self, response):
item = StromtarifeItem()
item['jahrespreis'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[3]/td[2]/text()').extract_first()
item['treuebonus'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[2]/td/strong/text()').extract_first()
item['sofortbonus'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[1]/td/strong/text()').extract_first()
item['tarif'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[1]/h2/span/text()').extract_first()
yield item



process = CrawlerProcess()
process.crawl(VattenfallEasy24KemptenV1500Spider)
process.crawl(VattenfallEasy24KemptenV2500Spider)
process.start()

pipelines.py

import pymysql
from stromtarife.items import StromtarifeItem


class StromtarifePipeline(object):
def __init__(self):
self.connection = pymysql.connect("localhost","root","**********","stromtarife")
self.cursor = self.connection.cursor()


def process_item(self, item, spider):
self.cursor.execute("INSERT INTO vattenfall (tarif, sofortbonus, treuebonus, jahrespreis) VALUES (%s, %s, %s, %s)", (item['tarif'], item['sofortbonus'], item['treuebonus'], item['jahrespreis']))
self.connection.commit()
self.cursor.close()
self.connection.close()

settings.py (i changed only that line)

ITEM_PIPELINES = {
'stromtarife.pipelines.StromtarifePipeline': 300,
}

那么我的代码有什么问题吗?我无法弄清楚,如果有人看到我错过的东西,我会非常高兴。提前致谢!

最佳答案

您不应在每次处理项目时关闭 pymsql 连接。

您应该像这样在管道中编写 close_spider 函数,以便在执行结束时仅关闭连接一次:

 def close_spider(self, spider):
self.cursor.close()
self.connection.close()

此外,您需要在 process_item 结束时退回您的商品

您的文件pipeline.py应如下所示:

import pymysql
from stromtarife.items import StromtarifeItem


class StromtarifePipeline(object):
def __init__(self):
self.connection = pymysql.connect("localhost","root","**********","stromtarife")
self.cursor = self.connection.cursor()


def process_item(self, item, spider):
self.cursor.execute("INSERT INTO vattenfall (tarif, sofortbonus, treuebonus, jahrespreis) VALUES (%s, %s, %s, %s)", (item['tarif'], item['sofortbonus'], item['treuebonus'], item['jahrespreis']))
self.connection.commit()
return item

def close_spider(self, spider):
self.cursor.close()
self.connection.close()

更新:

我尝试了你的代码,问题出在管道上,有两个问题:

  • 您尝试对欧元符号 建立索引,但我认为 mysql 不喜欢它。
  • 您的查询字符串构建得不好。

我成功地通过编写管道来完成工作,如下所示:

def process_item(self, item, spider):
query = """INSERT INTO vattenfall (tarif, sofortbonus, treuebonus, jahrespreis) VALUES (%s, %s, %s, %s)""" % ("1", "2", "3", "4")
self.cursor.execute(query)
self.connection.commit()
return item

我认为您应该从尝试插入的价格中删除

希望这有帮助,请告诉我。

关于mysql - Scrapy Pipeline 无法插入 MySQL,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43656127/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com