- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我正在尝试通过使用MRjob填充一个postgresql数据库。几天前,有人好心地建议我here将映射器分步进行。我试过但是给出了一个错误:
python db_store_hadoop.py -r local --dbname=en_ws xSparse.txt
no configs found; falling back on auto-configuration
no configs found; falling back on auto-configuration
creating tmp directory /tmp/db_store_hadoop.iarroyof.20160204.074501.695246
writing wrapper script to /tmp/db_store_hadoop.iarroyof.20160204.074501.695246/setup-wrapper.sh
PLEASE NOTE: Starting in mrjob v0.5.0, protocols will be strict by default. It's recommended you run your job with --strict-protocols or set up mrjob.conf as described at https://pythonhosted.org/mrjob/whats-new.html#ready-for-strict-protocols
writing to /tmp/db_store_hadoop.iarroyof.20160204.074501.695246/step-0-mapper_part-00000
> sh -ex setup-wrapper.sh /usr/bin/python db_store_hadoop.py --step-num=0 --mapper --dbname en_ws /tmp/db_store_hadoop.iarroyof.20160204.074501.695246/input_part-00000 > /tmp/db_store_hadoop.iarroyof.20160204.074501.695246/step-0-mapper_part-00000
writing to /tmp/db_store_hadoop.iarroyof.20160204.074501.695246/step-0-mapper_part-00001
> sh -ex setup-wrapper.sh /usr/bin/python db_store_hadoop.py --step-num=0 --mapper --dbname en_ws /tmp/db_store_hadoop.iarroyof.20160204.074501.695246/input_part-00001 > /tmp/db_store_hadoop.iarroyof.20160204.074501.695246/step-0-mapper_part-00001
STDERR: + __mrjob_PWD=/tmp/db_store_hadoop.iarroyof.20160204.074501.695246/job_local_dir/0/mapper/0
STDERR: + exec
STDERR: + /usr/bin/python -c import fcntl; fcntl.flock(9, fcntl.LOCK_EX)
STDERR: + export PYTHONPATH=/tmp/db_store_hadoop.iarroyof.20160204.074501.695246/job_local_dir/0/mapper/0/mrjob.tar.gz:/home/iarroyof/shogun-install/lib/python2.7/dist-packages:/home/iarroyof/shogun/examples/undocumented/python_modular:/home/iarroyof/smo-mkl/python:
STDERR: + exec
STDERR: + cd /tmp/db_store_hadoop.iarroyof.20160204.074501.695246/job_local_dir/0/mapper/0
STDERR: + /usr/bin/python db_store_hadoop.py --step-num=0 --mapper --dbname en_ws /tmp/db_store_hadoop.iarroyof.20160204.074501.695246/input_part-00000
STDERR: Traceback (most recent call last):
STDERR: File "db_store_hadoop.py", line 86, in <module>
STDERR: MRwordStore().run()
STDERR: File "/tmp/db_store_hadoop.iarroyof.20160204.074501.695246/job_local_dir/0/mapper/0/mrjob.tar.gz/mrjob/job.py", line 461, in run
STDERR: mr_job.execute()
STDERR: File "/tmp/db_store_hadoop.iarroyof.20160204.074501.695246/job_local_dir/0/mapper/0/mrjob.tar.gz/mrjob/job.py", line 470, in execute
STDERR: self.run_mapper(self.options.step_num)
STDERR: File "/tmp/db_store_hadoop.iarroyof.20160204.074501.695246/job_local_dir/0/mapper/0/mrjob.tar.gz/mrjob/job.py", line 530, in run_mapper
STDERR: for out_key, out_value in mapper_init() or ():
STDERR: File "db_store_hadoop.py", line 35, in mapper_init
STDERR: create_tables(self.cr0)
STDERR: File "db_store_hadoop.py", line 14, in create_tables
STDERR: cr.execute("create table word_list(id serial primary key, word character varying not null)")
STDERR: psycopg2.ProgrammingError: relation "word_list" already exists
STDERR:
Counters from step 1:
(no counters found)
Traceback (most recent call last):
File "db_store_hadoop.py", line 86, in <module>
MRwordStore().run()
File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 461, in run
mr_job.execute()
File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 479, in execute
super(MRJob, self).execute()
File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 153, in execute
self.run_job()
File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 216, in run_job
runner.run()
File "/usr/local/lib/python2.7/dist-packages/mrjob/runner.py", line 470, in run
self._run()
File "/usr/local/lib/python2.7/dist-packages/mrjob/sim.py", line 173, in _run
self._invoke_step(step_num, 'mapper')
File "/usr/local/lib/python2.7/dist-packages/mrjob/sim.py", line 264, in _invoke_step
self.per_step_runner_finish(step_num)
File "/usr/local/lib/python2.7/dist-packages/mrjob/local.py", line 152, in per_step_runner_finish
self._wait_for_process(proc_dict, step_num)
File "/usr/local/lib/python2.7/dist-packages/mrjob/local.py", line 268, in _wait_for_process
(proc_dict['args'], returncode, ''.join(tb_lines)))
Exception: Command ['sh', '-ex', 'setup-wrapper.sh', '/usr/bin/python', 'db_store_hadoop.py', '--step-num=0', '--mapper', '--dbname', 'en_ws', '/tmp/db_store_hadoop.iarroyof.20160204.074501.695246/input_part-00000'] returned non-zero exit status 1:
Traceback (most recent call last):
File "db_store_hadoop.py", line 86, in <module>
MRwordStore().run()
File "/tmp/db_store_hadoop.iarroyof.20160204.074501.695246/job_local_dir/0/mapper/0/mrjob.tar.gz/mrjob/job.py", line 461, in run
mr_job.execute()
File "/tmp/db_store_hadoop.iarroyof.20160204.074501.695246/job_local_dir/0/mapper/0/mrjob.tar.gz/mrjob/job.py", line 470, in execute
self.run_mapper(self.options.step_num)
File "/tmp/db_store_hadoop.iarroyof.20160204.074501.695246/job_local_dir/0/mapper/0/mrjob.tar.gz/mrjob/job.py", line 530, in run_mapper
for out_key, out_value in mapper_init() or ():
File "db_store_hadoop.py", line 35, in mapper_init
create_tables(self.cr0)
File "db_store_hadoop.py", line 14, in create_tables
cr.execute("create table word_list(id serial primary key, word character varying not null)")
psycopg2.ProgrammingError: relation "word_list" already exists
# -*- coding: utf-8 -*-
#Script for storing the sparse data into a database
import psycopg2
import re
import argparse
from mrjob.job import MRJob
def unicodize(segment):
if re.match(r'\\u[0-9a-f]{4}', segment):
return segment.decode('unicode-escape')
return segment.decode('utf-8')
def create_tables(cr):
cr.execute("create table word_list(id serial primary key, word character varying not null)")
cr.execute("""create table word_sparse(
id serial primary key,
word_id integer references word_list(id) not null,
pos integer not null,
val float not null)""")
def delete_tables(cr):
cr.execute("drop table word_sparse")
cr.execute("drop table word_list")
class MRwordStore(MRJob):
#conn = psycopg2.connect("dbname=%s user=semeval password=semeval" % args_n)
def configure_options(self):
super(MRwordStore, self).configure_options()
self.add_file_option('--dbname')
def mapper_init(self):
# make sqlite3 database available to mapper
self.conn = psycopg2.connect("dbname="+ self.options.dbname +" user=semeval password=semeval")
self.cr0 = self.conn.cursor()
create_tables(self.cr0)
def mapper(self, _, line):
self.cr = self.conn.cursor()
item = line.strip().split('\t')
replaced = u"".join((unicodize(seg) for seg in re.split(r'(\\u[0-9a-f]{4})', item[0])))
key = u''.join((c for c in replaced if c != '"'))
self.cr.execute("insert into word_list(word) values(%s) returning id", (key,))
word_id = self.cr.fetchone()[0]
#Parse the list, literal_eval is avoided because of memory issues
inside = False
number = ""
pos = 0
val = 0
for c in item[1]:
if c == '[':
inside = True
elif c.isdigit():
number += c
elif c == ',':
if inside:
pos = int(number)
number = ""
elif c == ']':
if inside:
val = int(number)
number = ""
self.cr.execute("insert into word_sparse(word_id, pos, val) values (%s, %s, %s)", (word_id, pos, val))
inside = False
def mapper_final(self):
self.conn.commit()
self.conn.close()
if __name__ == "__main__":
"""
Stores words in the database.
The first time, run with the arguments -cs.
If the database has to be recreated, run again with the d argument (-dcs)
Use the -f argument to specify the input file (sparse data)
Use the -n argument to specify the database name, which must be already created.
It also asumes the owner of the database is a user named semeval with password semeval
"""
MRwordStore().run()
最佳答案
经过几天的尝试,我在CREATE TABLE IF NOT EXISTS ...
上使用了初始连接和__main__
。在mapper_init()
处,我为每个映射器创建了一个新的连接和光标。这是用于通过Hadoop填充postgresql数据库的脚本:
# -*- coding: utf-8 -*-
# Script for storing the sparse data into a database.
# Dependencies: MRjob, psycopg2, postgresql and/or Hadoop.
import psycopg2
import re
import argparse
from mrjob.job import MRJob
dbname = "es_ws"
# Following global was created for my custom application. You can avoid it.
dsm = True
def unicodize(segment):
if re.match(r'\\u[0-9a-f]{4}', segment):
return segment.decode('unicode-escape')
return segment.decode('utf-8')
def replaced(item):
replaced = u"".join((unicodize(seg) for seg in re.split(r'(\\u[0-9a-f]{4})', item)))
word = replaced.strip('"')
return word
def insert_list_vector(cursor, word_id, vector):
inside = False
number = ""
pos = 0
val = 0
for c in vector:
if c == '[':
inside = True
elif c.isdigit():
number += c
elif c == ',':
if inside:
pos = int(number)
number = ""
elif c == ']':
if inside:
val = int(number)
number = ""
cursor.execute("insert into word_sparse(word_id, pos, val) values (%s, %s, %s)", (word_id, pos, val))
inside = False
def insert_dict_vector(cursor, word, vector):
palabra = word #replaced(palabra)
d = vector #item[1]
bkey = True
bvalue = False
key = ""
value = ""
for c in d:
if c == '{':
pass
elif c == ":":
bkey = False
bvalue = True
elif c in (",","}"):
bkey = True
bvalue = False
key = replaced(key.strip())
value = int(value)
sql = "INSERT INTO coocurrencias VALUES('%s', '%s', %s);"%(palabra, key, value)
cursor.execute(sql)
key = ""
value = ""
elif bkey:
key += c
elif bvalue:
value += c
def create_tables(cr):
if dsm:
cr.execute("create table if not exists coocurrencias(pal1 character varying, pal2 character varying, valor integer)")
cr.execute("create table if not exists words(id integer, word character varying)") #(id integer, word character varying, freq integer)
else:
cr.execute("create table if not exists word_list(id serial primary key, word character varying not null)")
cr.execute("""create table if not exists word_sparse(
id serial primary key, word_id integer references word_list(id) not null,
pos integer not null, val float not null)""")
class MRwordStore(MRJob):
def mapper_init(self):
self.conn = psycopg2.connect("dbname="+ dbname +" user=semeval password=semeval")
def mapper(self, _, line):
self.cr = self.conn.cursor()
item = line.strip().split('\t')
replaced = u"".join((unicodize(seg) for seg in re.split(r'(\\u[0-9a-f]{4})', item[0])))
key = u''.join((c for c in replaced if c != '"'))
if dsm:
self.cr.execute("insert into words(word) values(%s) returning id", (key,))
word_id = self.cr.fetchone()[0]
insert_dict_vector(cursor = self.cr, word = key, vector = item[1])
else:
self.cr.execute("insert into word_list(word) values(%s) returning id", (key,))
word_id = self.cr.fetchone()[0]
insert_list_vector(cursor = self.cr, word_id = word_id, vector = item[1])
def mapper_final(self):
self.conn.commit()
self.conn.close()
if __name__ == "__main__":
"""Stores word vectors into a database. Such a db (e.g. here is en_ws) must be previusly created in postgresql.
It also asumes the owner of the database is a user named semeval with password semeval.
This script parses input_file.txt containing lines in the next example format (dsm=False):
"word"<tab> [[number, number],[number, number], ...]
or (dsm=True)
"word"<tab> {key:value, key:value,...}
Use example:
python db_store_hadoop.py -r hadoop input_file.txt
"""
# Firstly create tables once for avoiding duplicates.
conn = psycopg2.connect("dbname="+ dbname +" user=semeval password=semeval")
create_tables(conn.cursor()) # Overwrite this function for customing your tables
conn.commit()
conn.close()
# Run the MR object
MRwordStore().run()
关于postgresql - psycopg2.ProgrammingError:关系*在通过MRjob弹出数据库时已经存在,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35195441/
一段时间以来,我一直在做这个反复出现的噩梦(阅读 - 我的应用程序中的错误)。出于某种原因,某个计时器在我停止后继续发送“Elapsed”事件,即使 在事件本身 计时器“承认”已被禁用!检查一下: /
为了找到 2 个 git 分支的共同祖先,需要做的是: git merge-base branch another_branch 好的。但是……如果两个分支都已经 merge 了怎么办?当我在这种情况
关闭。这个问题是opinion-based .它目前不接受答案。 想改进这个问题?更新问题,以便 editing this post 可以用事实和引用来回答它. 1年前关闭。 Improve this
我想要一个相机 View ,可以将图像捕获到本地文件或让用户从本地照片库中选择图像。我想也许有人为此编写了很好的库/代码。也许我可以利用它。已经有好的了吗?谢谢。我只是避免重新发明轮子:) 最佳答案
我从 master 分支创建了一个功能分支。之后有来自功能分支的提交 [F1]。 [F1] -- Feature Branch / [M1]-[M2
我喜欢使用 .NET 进行编程,尤其是 C# 3.0、.NET 3.5 和 WPF。但我特别喜欢的是 Mono .NET 确实与平台无关。 现在我听说了 Mono 中的 Olive 项目。我找不到某种
介绍和搜索 所以我认为我犯了一个严重的错误,我很担心。我已经分析了独立负责人的论坛,我已经接近找到答案,但场景太具体,不适用于我所在的位置。如果您找到可以回答我的问题的特定主题,请链接我。 例如:Ho
我有一个类似于下图的提交图。标记为 * 的提交表示大量提交。 A* | B--------- | | C* D* master 和 cor
我喜欢使用 .NET 进行编程,尤其是 C# 3.0、.NET 3.5 和 WPF。但我特别喜欢的是 Mono .NET 确实与平台无关。 现在我听说了 Mono 中的 Olive 项目。我找不到某种
我们最近接手了一个 .NET 项目,在查看 db 后,我们在某些列中有以下内容: 1)某些列具有诸如" & etc etc 2) 有些有 标签和其他非 html 编码的标签 这些数据
你好,当我导航到应用程序中的另一个页面时出现此错误 我不知道为什么这个错误出现 #0 _AsyncCompleter.complete (dart:async/future_impl.da
我使用以下 C 算法计算数据的 CRC32: #define CRC32_POLYNOM_REVERSED 0xEDB88320 uint32 calcCrc32(uint8* buffer, u
我试图在我的一个测试中断言模型中的字段没有改变。我知道从哲学上这是不正确的,但由于我控制了我需要知道的所有变量,所以我只想检查我的数据库条目是否没有改变。 我愿意接受一个解决方案,该解决方案可以将其转
我是 GitHub 的新手。并通过 Eclipse 使用它我们是两个人在开发一个应用程序。当我在 Git shell 中检查 git status 时,我得到以下状态。 On branch maste
简单代码: std::ifstream file("file.txt"); std::string line; while(getline(file,line)) ; //exhaust file
是的,我又找不到这个 Gradle DSL 方法:'compile()' 问题。 我检查了我有: buildscript { repositories { jcenter()
HTML: articles CSS: #main_menu { float: left; padding-top: 10px; vertical-align: m
我是一名优秀的程序员,十分优秀!