postgresql - psycopg2.ProgrammingError:关系*在通过MRjob弹出数据库时已经存在-6ren

postgresql - psycopg2.ProgrammingError:关系*在通过MRjob弹出数据库时已经存在

转载作者：行者123 更新时间：2023-12-02 21:28:01

我正在尝试通过使用MRjob填充一个postgresql数据库。几天前，有人好心地建议我here将映射器分步进行。我试过但是给出了一个错误:

python db_store_hadoop.py -r local --dbname=en_ws xSparse.txt
no configs found; falling back on auto-configuration
no configs found; falling back on auto-configuration
creating tmp directory /tmp/db_store_hadoop.iarroyof.20160204.074501.695246
writing wrapper script to /tmp/db_store_hadoop.iarroyof.20160204.074501.695246/setup-wrapper.sh

PLEASE NOTE: Starting in mrjob v0.5.0, protocols will be strict by default. It's recommended you run your job with --strict-protocols or set up mrjob.conf as described at https://pythonhosted.org/mrjob/whats-new.html#ready-for-strict-protocols

writing to /tmp/db_store_hadoop.iarroyof.20160204.074501.695246/step-0-mapper_part-00000
> sh -ex setup-wrapper.sh /usr/bin/python db_store_hadoop.py --step-num=0 --mapper --dbname en_ws /tmp/db_store_hadoop.iarroyof.20160204.074501.695246/input_part-00000 > /tmp/db_store_hadoop.iarroyof.20160204.074501.695246/step-0-mapper_part-00000
writing to /tmp/db_store_hadoop.iarroyof.20160204.074501.695246/step-0-mapper_part-00001
> sh -ex setup-wrapper.sh /usr/bin/python db_store_hadoop.py --step-num=0 --mapper --dbname en_ws /tmp/db_store_hadoop.iarroyof.20160204.074501.695246/input_part-00001 > /tmp/db_store_hadoop.iarroyof.20160204.074501.695246/step-0-mapper_part-00001
STDERR: + __mrjob_PWD=/tmp/db_store_hadoop.iarroyof.20160204.074501.695246/job_local_dir/0/mapper/0
STDERR: + exec
STDERR: + /usr/bin/python -c import fcntl; fcntl.flock(9, fcntl.LOCK_EX)
STDERR: + export PYTHONPATH=/tmp/db_store_hadoop.iarroyof.20160204.074501.695246/job_local_dir/0/mapper/0/mrjob.tar.gz:/home/iarroyof/shogun-install/lib/python2.7/dist-packages:/home/iarroyof/shogun/examples/undocumented/python_modular:/home/iarroyof/smo-mkl/python:
STDERR: + exec
STDERR: + cd /tmp/db_store_hadoop.iarroyof.20160204.074501.695246/job_local_dir/0/mapper/0
STDERR: + /usr/bin/python db_store_hadoop.py --step-num=0 --mapper --dbname en_ws /tmp/db_store_hadoop.iarroyof.20160204.074501.695246/input_part-00000
STDERR: Traceback (most recent call last):
STDERR:   File "db_store_hadoop.py", line 86, in <module>
STDERR:     MRwordStore().run()
STDERR:   File "/tmp/db_store_hadoop.iarroyof.20160204.074501.695246/job_local_dir/0/mapper/0/mrjob.tar.gz/mrjob/job.py", line 461, in run
STDERR:     mr_job.execute()
STDERR:   File "/tmp/db_store_hadoop.iarroyof.20160204.074501.695246/job_local_dir/0/mapper/0/mrjob.tar.gz/mrjob/job.py", line 470, in execute
STDERR:     self.run_mapper(self.options.step_num)
STDERR:   File "/tmp/db_store_hadoop.iarroyof.20160204.074501.695246/job_local_dir/0/mapper/0/mrjob.tar.gz/mrjob/job.py", line 530, in run_mapper
STDERR:     for out_key, out_value in mapper_init() or ():
STDERR:   File "db_store_hadoop.py", line 35, in mapper_init
STDERR:     create_tables(self.cr0)
STDERR:   File "db_store_hadoop.py", line 14, in create_tables
STDERR:     cr.execute("create table word_list(id serial primary key, word character varying not null)")
STDERR: psycopg2.ProgrammingError: relation "word_list" already exists
STDERR: 
Counters from step 1:
  (no counters found)
Traceback (most recent call last):
  File "db_store_hadoop.py", line 86, in <module>
    MRwordStore().run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 461, in run
    mr_job.execute()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 479, in execute
    super(MRJob, self).execute()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 153, in execute
    self.run_job()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 216, in run_job
    runner.run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/runner.py", line 470, in run
    self._run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/sim.py", line 173, in _run
    self._invoke_step(step_num, 'mapper')
  File "/usr/local/lib/python2.7/dist-packages/mrjob/sim.py", line 264, in _invoke_step
    self.per_step_runner_finish(step_num)
  File "/usr/local/lib/python2.7/dist-packages/mrjob/local.py", line 152, in per_step_runner_finish
    self._wait_for_process(proc_dict, step_num)
  File "/usr/local/lib/python2.7/dist-packages/mrjob/local.py", line 268, in _wait_for_process
    (proc_dict['args'], returncode, ''.join(tb_lines)))
Exception: Command ['sh', '-ex', 'setup-wrapper.sh', '/usr/bin/python', 'db_store_hadoop.py', '--step-num=0', '--mapper', '--dbname', 'en_ws', '/tmp/db_store_hadoop.iarroyof.20160204.074501.695246/input_part-00000'] returned non-zero exit status 1:
Traceback (most recent call last):
  File "db_store_hadoop.py", line 86, in <module>
    MRwordStore().run()
  File "/tmp/db_store_hadoop.iarroyof.20160204.074501.695246/job_local_dir/0/mapper/0/mrjob.tar.gz/mrjob/job.py", line 461, in run
    mr_job.execute()
  File "/tmp/db_store_hadoop.iarroyof.20160204.074501.695246/job_local_dir/0/mapper/0/mrjob.tar.gz/mrjob/job.py", line 470, in execute
    self.run_mapper(self.options.step_num)
  File "/tmp/db_store_hadoop.iarroyof.20160204.074501.695246/job_local_dir/0/mapper/0/mrjob.tar.gz/mrjob/job.py", line 530, in run_mapper
    for out_key, out_value in mapper_init() or ():
  File "db_store_hadoop.py", line 35, in mapper_init
    create_tables(self.cr0)
  File "db_store_hadoop.py", line 14, in create_tables
    cr.execute("create table word_list(id serial primary key, word character varying not null)")
psycopg2.ProgrammingError: relation "word_list" already exists

这是我的工作代码:

# -*- coding: utf-8 -*-
#Script for storing the sparse data into a database
import psycopg2
import re
import argparse
from mrjob.job import MRJob

def unicodize(segment):
    if re.match(r'\\u[0-9a-f]{4}', segment):
        return segment.decode('unicode-escape')
    return segment.decode('utf-8')

def create_tables(cr):
    cr.execute("create table word_list(id serial primary key, word character varying not null)")
    cr.execute("""create table word_sparse(
        id serial primary key, 
        word_id integer references word_list(id) not null,
        pos integer not null,
        val float not null)""")

def delete_tables(cr):
    cr.execute("drop table word_sparse")
    cr.execute("drop table word_list")

class MRwordStore(MRJob):
    #conn = psycopg2.connect("dbname=%s user=semeval password=semeval" % args_n)
    def configure_options(self):
        super(MRwordStore, self).configure_options()
        self.add_file_option('--dbname')

    def mapper_init(self):
        # make sqlite3 database available to mapper
        self.conn = psycopg2.connect("dbname="+ self.options.dbname +" user=semeval password=semeval")
        self.cr0 = self.conn.cursor()
        create_tables(self.cr0)

    def mapper(self, _, line):
        self.cr = self.conn.cursor()
        item = line.strip().split('\t')
        replaced = u"".join((unicodize(seg) for seg in re.split(r'(\\u[0-9a-f]{4})', item[0])))
        key = u''.join((c for c in replaced if c != '"'))

        self.cr.execute("insert into word_list(word) values(%s) returning id", (key,))
        word_id = self.cr.fetchone()[0]

            #Parse the list, literal_eval is avoided because of memory issues
        inside = False
        number = ""
        pos = 0
        val = 0
        for c in item[1]:
            if c == '[':
                inside = True
            elif c.isdigit():
                number += c
            elif c == ',':
                if inside:
                    pos = int(number)
                    number = ""
            elif c == ']':
                if inside:
                    val = int(number)
                    number = ""
                    self.cr.execute("insert into word_sparse(word_id, pos, val) values (%s, %s, %s)", (word_id, pos, val))
                inside = False

    def mapper_final(self):

        self.conn.commit()
        self.conn.close()


if __name__ == "__main__":
    """
    Stores words in the database.

    The first time, run with the arguments -cs.
    If the database has to be recreated, run again with the d argument (-dcs)

    Use the -f argument to specify the input file (sparse data)
    Use the -n argument to specify the database name, which must be already created.

    It also asumes the owner of the database is a user named semeval with password semeval
    """

    MRwordStore().run()

如果有人可以帮助我确定错误和误解，将不胜感激。

最佳答案

经过几天的尝试，我在CREATE TABLE IF NOT EXISTS ...上使用了初始连接和__main__。在mapper_init()处，我为每个映射器创建了一个新的连接和光标。这是用于通过Hadoop填充postgresql数据库的脚本:

# -*- coding: utf-8 -*-
# Script for storing the sparse data into a database. 
# Dependencies: MRjob, psycopg2, postgresql and/or Hadoop.

import psycopg2
import re
import argparse
from mrjob.job import MRJob

dbname = "es_ws" 
# Following global was created for my custom application. You can avoid it.
dsm = True 

def unicodize(segment):
    if re.match(r'\\u[0-9a-f]{4}', segment):
        return segment.decode('unicode-escape')
    return segment.decode('utf-8')

def replaced(item):
    replaced = u"".join((unicodize(seg) for seg in re.split(r'(\\u[0-9a-f]{4})', item)))
    word = replaced.strip('"')
    return word

def insert_list_vector(cursor, word_id, vector):
   inside = False
   number = ""
   pos = 0
   val = 0
   for c in vector:
        if c == '[':
            inside = True
        elif c.isdigit():
                number += c
        elif c == ',':
            if inside:
                pos = int(number)
                number = ""
        elif c == ']':
            if inside:
                val = int(number)
                number = ""
                cursor.execute("insert into word_sparse(word_id, pos, val) values (%s, %s, %s)", (word_id, pos, val))
            inside = False

def insert_dict_vector(cursor, word, vector):
        palabra = word #replaced(palabra)
        d = vector #item[1] 
        bkey = True
        bvalue = False
        key = ""
        value = ""
        for c in d:
            if c == '{':
                pass
            elif c == ":":
                bkey = False
                bvalue = True
            elif c in (",","}"):
                bkey = True
                bvalue = False
                key = replaced(key.strip())
                value = int(value)
                sql = "INSERT INTO coocurrencias VALUES('%s', '%s', %s);"%(palabra, key, value)
                cursor.execute(sql)
                key = ""
                value = ""
            elif bkey:
                key += c
            elif bvalue:
                value += c

def create_tables(cr):
    if dsm:   
        cr.execute("create table if not exists coocurrencias(pal1 character varying, pal2 character varying, valor integer)")
        cr.execute("create table if not exists words(id integer, word character varying)") #(id integer, word character varying, freq integer)
    else:
        cr.execute("create table if not exists word_list(id serial primary key, word character varying not null)")
        cr.execute("""create table if not exists word_sparse(
                  id serial primary key, word_id integer references word_list(id) not null,
                  pos integer not null, val float not null)""")

class MRwordStore(MRJob):

    def mapper_init(self):
        self.conn = psycopg2.connect("dbname="+ dbname +" user=semeval password=semeval")

    def mapper(self, _, line):
        self.cr = self.conn.cursor()
        item = line.strip().split('\t')
        replaced = u"".join((unicodize(seg) for seg in re.split(r'(\\u[0-9a-f]{4})', item[0])))
        key = u''.join((c for c in replaced if c != '"'))     
    if dsm:
        self.cr.execute("insert into words(word) values(%s) returning id", (key,))
            word_id = self.cr.fetchone()[0]
            insert_dict_vector(cursor = self.cr, word = key, vector = item[1])
        else:          
            self.cr.execute("insert into word_list(word) values(%s) returning id", (key,))
            word_id = self.cr.fetchone()[0]
            insert_list_vector(cursor = self.cr, word_id = word_id, vector = item[1])

    def mapper_final(self):
        self.conn.commit()
        self.conn.close()

if __name__ == "__main__":
    """Stores word vectors into a database. Such a db (e.g. here is en_ws) must be previusly created in postgresql. 
    It also asumes the owner of the database is a user named semeval with password semeval.
    This script parses input_file.txt containing lines in the next example format (dsm=False):

    "word"<tab> [[number, number],[number, number], ...]

    or (dsm=True)

    "word"<tab> {key:value, key:value,...}

    Use example:

    python db_store_hadoop.py -r hadoop input_file.txt
    """
    # Firstly create tables once for avoiding duplicates.
    conn = psycopg2.connect("dbname="+ dbname +" user=semeval password=semeval")
    create_tables(conn.cursor()) # Overwrite this function for customing your tables
    conn.commit()
    conn.close()    

    # Run the MR object
    MRwordStore().run()

如果有人有其他建议，欢迎您。

关于postgresql - psycopg2.ProgrammingError:关系*在通过MRjob弹出数据库时已经存在，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35195441/

文章推荐： ubuntu - 无法构建任何 Dockerfile

文章推荐： hadoop - sqoop 将数据插入 rdbms 表中的错误配置单元列

文章推荐： docker - 如何确定docker master或agent是否正在运行

文章推荐： hadoop - Oozie工作不正常

数据库
我的问题是如何在 python 中创建一个简单的数据库。我的例子是: User = { 'Name' : {'Firstname', 'Lastname'}, 'Address' : {'Street
mysql - iOS开发。数据库？数据库？什么是最好的方法？
我需要创建一个与远程数据库链接的应用程序! mysql 是最好的解决方案吗？ Sqlite 是唯一的本地解决方案吗？我使用下面的方法，我想知道它是否是最好的方法! NSString *evento
java 应用程序无法连接到远程 MySQL 数据库，但可以连接到本地 MySQL 数据库
给定两台 MySQL 服务器，一台本地，一台远程。两者都有一个包含表 bohica 的数据库 foobar。本地服务器定义了用户 'myadmin'@'%' 和 'myadmin'@'localhos
java - 灵活查询适用于 HANA 数据库，但不适用于 HSQL 数据库
我有以下灵活的搜索查询 Select {vt:code},{vt:productcode},{vw:code},{vw:productcode} from {abcd AS vt JOIN wxyz
mysql - 从原始数据文件恢复 MySQL 数据库 [XAMPP | MySQL |数据库]
好吧，我的电脑开始运行有点缓慢，所以我重置了 Windows，保留了我的文件。因为我的大脑还没有打开，所以我忘记事先备份我的 MySQL 数据库。我仍然拥有所有原始文件，因此我实际上仍然拥有数据库，但
android - 如何将我的 Access 数据库 (.accdb) 转换为 SQLite 数据库 (.sqlite)？
如何将我的 Access 数据库 (.accdb) 转换为 SQLite 数据库 (.sqlite)？请，任何帮助将不胜感激。最佳答案 1)如果要转换 db 的结构，则应使用任何 DB 建模工具:
django - 实际上我将我的 django 数据库 sqlite3 连接到 Mysql 数据库，每当我迁移时我都会收到此错误
系统检查发现了一些问题: 警告:？:(mysql.W002)未为数据库连接“默认”设置 MySQL 严格模式提示:MySQL 的严格模式通过将警告升级为错误来修复 MySQL 中的许多数据完整性问题
django - 实际上我将我的 django 数据库 sqlite3 连接到 Mysql 数据库，每当我迁移时我都会收到此错误
系统检查发现了一些问题: 警告:？:(mysql.W002)未为数据库连接“默认”设置 MySQL 严格模式提示:MySQL 的严格模式通过将警告升级为错误来修复 MySQL 中的许多数据完整性问题
android - 如何在 phonegap 数据库中使用 android 数据库/作为 phonegap 数据库
我想在相同的 phonegap 应用程序中使用 android 数据库。更多说明: 我创建了 phonegap 应用程序，但 phonegap 应用程序不支持服务，所以我们已经在 java 中为 a
javascript - 将日期插入 mysql 数据库 [我正在使用 php 和 xampp mysql 数据库]
Time Tracker function clock() { var mytime = new Date(); var seconds
php - MySQL如何从年份(参数)、weekOfYear(参数)、时间(数据库)和dayofweek(数据库)创建时间戳？
我需要在现有项目上实现一些事件的显示。我无法更改数据库结构。在我的 Controller 中，我(从 ajax 请求)传递了一个时间戳，并且我需要显示之前的 8 个事件。因此，如果时间戳是(转换后)
performance - : {REST API, 网站} --> {数据库} 或 {网站} --> {REST API} --> {数据库} 哪个更好？
我有一个可以收集和显示各种测量值的产品(不会详细介绍)。正如人们所期望的那样，显示部分是一个数据库+建立在其之上的网站(使用 Symfony)。但是，我们可能还会创建一个 API 来向第三方公开数据
sql-server - Azure SQL 数据库 - 查询速度明显慢于 Azure VM 上的 SQL 数据库
我们将 SQL Server 从 Azure VM 迁移到 Azure SQL 数据库。 Azure VM 为 DS2_V2、2 核、7GB RAM、最大 6400 IOPS Azure SQL 数据
java - MongoDB 如何在 Java 本地测试 MongoDB 数据库，比如 H2 和 sql 数据库？
我正在开发一个使用 MongoDB 数据库的程序，但我想问在通过 Java 执行 SQL 时是否可以使用内部数据库进行测试，例如 H2？最佳答案你可以尝试使用Testcontainers Test
sql - 如何从 unix 终端连接到 Microsoft SQL Server 数据库？我必须连接 SQL Server 2008 数据库
已关闭。此问题不符合Stack Overflow guidelines 。目前不接受答案。已关闭 9 年前。此问题似乎与 a specific programming problem, a sof
python - 尝试使用 MSI 身份验证从 Azure ML 服务连接 Azure SQL 数据库(无需用户名和密码即可连接 Azure 数据库)
我正在尝试使用 MSI 身份验证(无需用户名和密码)从 Azure 机器学习服务连接 Azure SQL 数据库。我正在尝试在 Azure 机器学习服务上建立机器学习模型，目的是我需要数据，这就是我
数据库；空场似乎不空
我在我的 MySQL 数据库中使用这个查询来查找 my_column 不为空的所有行: SELECT * FROM my_table WHERE my_column != ""; 不幸的是，许多行在
数据库 |选择不同的记录
我有那个基地:http://sqlfiddle.com/#!2/e5a24/2这是 WordPress 默认模式的简写。我已经删除了该示例不需要的字段。如您所见，我的结果是“类别 1”的两倍。我喜欢
数据库。提取过滤列的数据
我有一张这样的 table : mysql> select * from users; +--------+----------+------------+-----------+ | userid
数据库 |高级分面搜索
我有表: CREATE TABLE IF NOT EXISTS `category` ( `id` int(11) NOT NULL, `name` varchar(255) NOT NULL

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

postgresql - psycopg2.ProgrammingError:关系*在通过MRjob弹出数据库时已经存在