- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我有一个看似不可能的难题,希望你们能帮我指明正确的方向。我已经来来去去这个项目几个星期了,我想是时候解决它了,希望得到你的帮助。
我正在制作一个脚本,它应该从目录结构中读取一堆 .xls excel 文件,解析它们的内容并将其加载到 mysql 数据库中。现在,在 main 函数中,(克罗地亚语)文件名列表被传递给 xlrd,这就是问题所在。
环境是最新的 FreeBSD 9.1。
执行脚本时出现以下错误:
mars:~/20130829> python megascript.py
Python version: 2.7.5
Filesstem encoding is: UTF-8
Removing error.log if it exists...
It doesn't.
Done!
Connecting to database...
Done!
MySQL database version: 5.6.13
Loading pilots...
Done!
Loading tehnicians...
Done!
Loading aircraft registrations...
Done!
Loading file list...
Done!
Processing files...
/2006/1_siječanj.xls
Traceback (most recent call last):
File "megascript.py", line 540, in <module>
main()
File "megascript.py", line 491, in main
data = readxlsfile(files, 'UPIS', piloti, tehnicari, helikopteri)
File "megascript.py", line 129, in readxlsfile
workbook = open_workbook(f)
File "/usr/local/lib/python2.7/site-packages/xlrd-0.9.2-py2.7.egg/xlrd/__init__.py", line 394, in open_workbook
f = open(filename, "rb")
IOError: [Errno 2] No such file or directory: u'/2006/1_sije\u010danj.xls'
我包含了完整的输出以使代码更容易理解。我想问题出在 xlrd 不接受 utf-8 文件列表。不过,我不确定如何在不弄乱 xlrd 代码的情况下解决这个问题。有什么想法吗?
代码如下:
#! /usr/bin/env/python
# -#*- coding: utf-8 -*-
import os, sys, getopt, codecs, csv, MySQLdb, platform
from mmap import mmap,ACCESS_READ
from xlrd import open_workbook, xldate_as_tuple
# Define constants
NALET_OUT = ''
PUTNICI_OUT = ''
DB_HOST = 'localhost'
DB_USER = 'user'
DB_PASS = 'pass'
DB_DATABASE = 'eth'
START_DIR = u'./'
ERROR_FILE = START_DIR + 'mega_error.log'
# Functions
def isNumber(s):
# Check if a string could be a number
try:
float(s)
return True
except ValueError:
return False
def getMonth(f):
# Izvuci mjesec iz imena datoteke u formatu "1_sijecanj.xls"
temp = os.path.basename(f)
temp = temp.split('_')
mjesec = int(temp[0])
return mjesec
def getYear(f):
# Izvuci godinu iz path
f = f.split('/')
godina = f[-2]
return godina
def databaseVersion(cur):
# Print Mysql database version
try:
cur.execute("SELECT VERSION()")
result = cur.fetchone()
except MySQLdb.Error, e:
try:
print "MySQL Error [%d]: %s]" % (e.args[0], e.args[1])
except IndexError:
print "MySQL Error: %s" % (e.args[0], e.args[1])
print "MySQL database version: %s" % result
def getQuery(cur, sql_query):
# Perform passed query on passed database
try:
cur.execute(sql_query)
result = cur.fetchall()
except MySQLdb.Error, e:
try:
print "MySQL Error [%d]: %s]" % (e.args[0], e.args[1])
except IndexError:
print "MySQL Error: %s" % (e.args[0], e.args[1])
return result
def getFiles():
files = []
# Find subdirectories
for i in [x[0] for x in os.walk(START_DIR)]:
if (i != '.' and isNumber(os.path.basename(i))):
# Find files in subdirectories
for j in [y[2] for y in os.walk(i)]:
# For every file in file list
for y in j:
fn, fe = os.path.splitext(y)
is_mj = fn.split("_")
if(fe == '.xls' and y.find('_') and isNumber(is_mj[0])):
mj = fn.split('_')
files.append(i.lstrip('.') + "/" + y)
# Sort list cronologically
files.sort(key=lambda x: getMonth(x))
files.sort(key=lambda x: getYear(x))
return files
def errhandle(f, datum, var, vrijednost, ispravka = "NULL"):
# Get error information, print it on screen and write to error.log
f = unicode(str(f), 'utf-8')
datum = unicode(str(datum), 'utf-8')
var = unicode(str(var), 'utf-8')
try:
vrijednost = unicode(str(vrijednost.decode('utf-8')), 'utf-8')
except UnicodeEncodeError:
vrijednost = vrijednost
ispravka = unicode(str(ispravka), 'utf-8')
err_f = codecs.open(ERROR_FILE, 'a+', 'utf-8')
line = f + ": " + datum + " " + var + "='" + vrijednost\
+ "' Ispravka='" + ispravka + "'"
#print "%s" % line
err_f.write(line)
err_f.close()
def readxlsfile(files, sheet, piloti, tehnicari, helikopteri):
# Read xls file and return a list of rows
data = []
nalet = []
putn = []
id_index = 0
# For every file in list
for f in files:
print "%s" % f
temp = f.split('/')
godina = str(temp[-2])
temp = os.path.basename(f).split('_')
mjesec = str(temp[0])
workbook = open_workbook(f)
sheet = workbook.sheet_by_name('UPIS')
# For every row that doesn't contain '' or 'POSADA' or 'dan' etc...
for ri in range(sheet.nrows):
if sheet.cell(ri,1).value!=''\
and sheet.cell(ri,2).value!='POSADA'\
and sheet.cell(ri,1).value!='dan'\
and (sheet.cell(ri,2).value!=''):
temp = sheet.cell(ri, 1).value
temp = temp.split('.')
dan = temp[0]
# Datum
datum = "'" + godina + "-" + mjesec + "-" + dan + "'"
# Kapetan
kapetan = ''
kapi=''
if sheet.cell(ri, 2).value == "":
kapetan = "NULL"
else:
kapetan = sheet.cell(ri, 2).value
if kapetan[-1:] == " ":
errhandle(f, datum, 'kapetan', kapetan, kapetan[-1:])
kapetan = kapetan[:-1]
if(kapetan):
try:
kapi = [x[0] for x in piloti if x[2].lower() == kapetan]
kapi = kapi[0]
except ValueError:
errhandle(f, datum, 'kapetan', kapetan, '')
kapetan = ''
except IndexError:
errhandle(f, datum, 'kapetan', kapetan, '')
kapi = 'NULL'
else:
kapi="NULL"
# Kopilot
kopilot = ''
kopi = ''
if sheet.cell(ri, 3).value == "":
kopi = "NULL"
else:
kopilot = sheet.cell(ri, 3).value
if kopilot[-1:] == " ":
errhandle(f, datum,'kopilot', kopilot,\
kopilot[:-1])
if(kopilot):
try:
kopi = [x[0] for x in piloti if x[2].lower() == kopilot]
kopi = kopi[0]
except ValueError:
errhandle(f, datum,'kopilot', kopilot, '')
except IndexError:
errhandle(f, datum, 'kopilot', kopilot, '')
kopi = 'NULL'
else:
kopi="NULL"
# Teh 1
teh1 = ''
t1i = ''
if sheet.cell(ri, 4).value=='':
t1i = 'NULL'
else:
teh1 = sheet.cell(ri, 4).value
if teh1[-1:] == " ":
errhandle(f, datum,'teh1', teh1, teh1[:-1])
teh1 = 'NULL'
if(teh1):
try:
t1i = [x[0] for x in tehnicari if x[2].lower() == teh1]
t1i = t1i[0]
except ValueError:
errhandle(f, datum,'teh1', teh1, '')
except IndexError:
errhandle(f, datum, 'teh1', teh1, '')
t1i = 'NULL'
else:
t1i="NULL"
# Teh 2
teh2=''
t2i=''
if sheet.cell(ri, 5).value=='':
t2i = "NULL"
else:
teh2 = sheet.cell(ri, 5).value
if teh2[-1:] == " ":
errhandle(f, datum,'teh2', teh2, teh2[-1:])
teh2 = ''
if(teh2):
try:
t2i = [x[0] for x in tehnicari if x[2].lower() == teh2]
t2i = t2i[0]
except ValueError:
errhandle(f, datum,'teh2', teh2, 'NULL')
t2i = 'NULL'
except IndexError:
errhandle(f, datum,'teh2', teh2, 'NULL')
t2i = 'NULL'
else:
t2i="NULL"
# Oznaka
oznaka = ''
heli = ''
if sheet.cell(ri, 6).value=="":
oznaka = errhandle(f, datum, "helikopter", oznaka, "")
else:
oznaka = str(int(sheet.cell(ri, 6).value))
try:
heli = [x[0] for x in helikopteri if x[0] == oznaka]
except ValueError:
errhandle(f, datum, 'helikopter', oznaka, '')
except IndexError:
errhandle(f, datum, 'helikopter', oznaka, '')
heli = ''
# Uvjeti
uvjeti = sheet.cell(ri, 9).value
# Letova
letova_dan = 0
letova_noc = 0
letova_ifr = 0
letova_sim = 0
if sheet.cell(ri, 7).value == "":
errhandle(f, datum, 'letova', letova, '')
else:
letova = str(int(sheet.cell(ri, 7).value))
if uvjeti=="vfr":
letova_dan = letova
elif uvjeti=="ifr":
letova_ifr = letova
elif uvjeti=="sim":
letova_sim = letova
else:
letova_noc = letova
#Block time
bt_dan = "'00:00:00'"
bt_noc = "'00:00:00'"
bt_ifr = "'00:00:00'"
bt_sim = "'00:00:00'"
try:
bt_tpl = xldate_as_tuple(sheet.cell(ri, 8).value, workbook.datemode)
bt_m = bt_tpl[4]
bt_h = bt_tpl[3]
bt = "'" + str(bt_h).zfill(2)+":"+str(bt_m)+":00'"
except ValueError or IndexError:
errhandle(f, datum, 'bt', sheet.cell(ri,8).value, '')
if uvjeti[:3]=="vfr":
bt_dan = bt
elif uvjeti[:3]=="ifr":
bt_ifr = bt
elif uvjeti[:3]=="sim":
bt_sim = bt
elif uvjeti[:2] == "no":
bt_noc = bt
else:
errhandle(f, datum, 'uvjeti', uvjeti, '')
# Vrsta leta
vrsta = "'" + sheet.cell(ri, 10).value + "'"
# Vjezba
vjezba = 'NULL';
try:
vjezba = sheet.cell(ri, 11).value
if vjezba == '':
# Too many results
#errhandle(f, datum, 'vjezba', vjezba, '')
vjezba = 'NULL'
if vjezba == "?":
errhandle(f, datum, 'vjezba', str(vjezba), '')
vjezba = 'NULL'
if str(vjezba) == 'i':
errhandle(f, datum, 'vjezba', str(vjezba), '')
vjezba = 'NULL'
if str(vjezba)[-1:] == 'i':
errhandle(f, datum, 'vjezba', str(vjezba),\
str(vjezba).rstrip('i'))
vjezba = str(vjezba).rstrip('i')
if str(vjezba).find(' i ') != -1:
errhandle(f, datum, 'vjezba', str(vjezba), str(vjezba).split(' i ')[0])
vjezba = str(vjezba).split(' i ')
vjezba = vjezba[0]
if str(vjezba)[-1:] == 'm':
errhandle(f, datum, 'vjezba', str(vjezba), str(vjezba).rstrip('m'))
vjezba = str(vjezba).rstrip('m')
if str(vjezba).find(';') != -1:
errhandle(f, datum, 'vjezba', str(vjezba), str(vjezba).split(';')[0])
temp = str(vjezba).split(';')
vjezba = temp[0]
if str(vjezba).find('/') != -1:
errhandle(f, datum, 'vjezba', str(vjezba), str(vjezba).split('/')[0])
temp = str(vjezba).split('/')
vjezba = temp[0]
if str(vjezba).find('-') != -1:
errhandle(f, datum, 'vjezba', str(vjezba), str(vjezba).split('-')[0])
temp = str(vjezba).split('-')
vjezba = temp[0]
if str(vjezba).find(',') != -1:
errhandle(f, datum, 'vjezba', str(vjezba), str(vjezba).split(',')[0])
temp = str(vjezba).split(',')
vjezba = temp[0]
if str(vjezba).find('_') != -1:
errhandle(f, datum, 'vjezba', str(vjezba), str(vjezba).split('_')[0])
temp = str(vjezba).split('_')
vjezba = temp[0]
if str(vjezba) == 'bo':
errhandle(f, datum, 'vjezba', str(vjezba), '')
vjezba = 'NULL'
if str(vjezba).find(' ') != -1:
if str(vjezba) == 'pp 300':
errhandle(f, datum, 'vjezba', str(vjezba), str(vjezba).split(' ')[1])
temp = str(vjezba).split(' ')
vjezba = temp[1]
else:
errhandle(f, datum, 'vjezba', str(vjezba), str(vjezba).split(' ')[0])
temp = str(vjezba).split(' ')
vjezba = temp[0]
if str(vjezba) == 'pp':
errhandle(f, datum, 'vjezba', str(vjezba), '')
vjezba = ''
except UnicodeEncodeError:
errhandle(f, datum, 'Unicode error! vjezba', vjezba, '')
if vjezba != 'NULL':
vjezba = int(float(vjezba))
# Visinska slijetanja
# Putnici
vp1 = str(sheet.cell(ri, 12).value)
bp1 = str(sheet.cell(ri, 13).value)
vp2 = str(sheet.cell(ri, 14).value)
bp2 = str(sheet.cell(ri, 15).value)
# Teret
teret = ''
teret = str(sheet.cell(ri, 16).value)
if teret == '':
teret = 0
# Baja
baja = ''
if sheet.cell(ri, 17).value == '':
baja = 0
else:
baja = int(sheet.cell(ri, 17).value) / 2 # dodano /2 da se dobiju tone
# Redosljed csv
id_index = id_index + 1
row = [id_index, datum, kapi, kopi, t1i, t2i, oznaka,\
letova, letova_dan, letova_noc, letova_ifr,\
letova_sim, bt, bt_dan, bt_noc, bt_ifr,\
bt_sim, vrsta, vjezba, teret, baja]
row = [str(i) for i in row]
nalet.append(row)
putn = []
if bp1 != '':
put = [id_index, vp1, bp1]
putn.append(put)
if bp2 != '':
put = [id_index, vp2, bp2]
putn.append(put)
data.append(nalet)
data.append(putn)
return data
def main():
# Python version
print "\nPython version: %s \n" % platform.python_version()
# Print filesystem encoding
print "Filesstem encoding is: %s" % sys.getfilesystemencoding()
# Remove error file if exists
print "Removing error.log if it exists..."
try:
os.remove(ERROR_FILE)
print "It did."
except OSError:
print "It doesn't."
pass
print "Done!"
# Connect to database
print "Connecting to database..."
db = MySQLdb.connect(DB_HOST, DB_USER, DB_PASS, DB_DATABASE,\
use_unicode=True, charset='utf8')
cur=db.cursor()
print "Done!"
# Database version
databaseVersion(cur)
# Load pilots, tehnicians and helicopters from db
print "Loading pilots..."
sql_query = "SELECT eth_osobnici.id, eth_osobnici.ime,\
eth_osobnici.prezime FROM eth_osobnici RIGHT JOIN \
eth_letacka_osposobljenja ON eth_osobnici.id=\
eth_letacka_osposobljenja.id_osobnik WHERE \
eth_letacka_osposobljenja.vrsta_osposobljenja='kapetan' \
OR eth_letacka_osposobljenja.vrsta_osposobljenja='kopilot'"
#piloti = []
#piloti = getQuery(cur, sql_query)
piloti=[]
temp = []
temp = getQuery(cur, sql_query)
for row in temp:
piloti.append(row)
print "Done!"
print "Loading tehnicians..."
sql_query = "SELECT eth_osobnici.id, eth_osobnici.ime,\
eth_osobnici.prezime FROM eth_osobnici RIGHT JOIN \
eth_letacka_osposobljenja ON eth_osobnici.id=\
eth_letacka_osposobljenja.id_osobnik WHERE \
eth_letacka_osposobljenja.vrsta_osposobljenja='tehničar 1' \
OR eth_letacka_osposobljenja.vrsta_osposobljenja='tehničar 2'"
tehnicari=[]
temp = []
temp = getQuery(cur, sql_query)
for row in temp:
tehnicari.append(row)
print "Done!"
print "Loading aircraft registrations..."
sql_query = "SELECT id FROM eth_helikopteri"
helikopteri=[]
temp = []
temp = getQuery(cur, sql_query)
for row in temp:
helikopteri.append(row)
print "Done!"
# Get file names to process
print "Loading file list..."
files = getFiles()
print "Done!"
# Process all files from array
print "Processing files..."
data = readxlsfile(files, 'UPIS', piloti, tehnicari, helikopteri)
print "Done!"
# Enter new information in database
result = 0
print "Reseting database..."
sql_query = "DELETE FROM eth_nalet"
cur.execute(sql_query)
db.commit()
sql_query = "ALTER TABLE eth_nalet AUTO_INCREMENT=0"
cur.execute(sql_query)
db.commit()
print "Done!"
print "Loading data in 'eth_nalet'..."
for row in data[0]:
sql_query = """INSERT INTO eth_nalet (id, datum, kapetan,
kopilot, teh1, teh2, registracija, letova_uk, letova_dan,
letova_noc, letova_ifr, letova_sim, block_time, block_time_dan,
block_time_noc, block_time_ifr, block_time_sim, vrsta_leta,
vjezba, teret, baja) VALUES (%s)""" % (", ".join(row))
cur.execute(sql_query)
db.commit()
print "Done!"
print "Loading data in 'eth_putnici'..."
for row in data[1]:
sql_query = """INSERT INTO eth_putnici (id_leta,
vrsta_putnika, broj_putnika) VALUES (%s)""" % (", ".join(row))
cur.execute(sql_query)
db.commit()
print "Done!"
# Close the database connection
print "Closing database connection..."
if cur:
cur.close()
if db:
db.close()
print "Database closed!"
if __name__ == '__main__':
main()
我很抱歉没有翻译代码中的评论,这是我的一个老项目,我现在倾向于用英语发表评论。如果有什么需要解释的,请开火。
有趣的是,如果我将文件列表打印到屏幕上,它们显示得很好。但是当它们被传递到 xlrd 时,它们的格式似乎不正确。
尊敬的,我
最佳答案
终于找到错误了!毕竟这不是由于编码错误。这是一个逻辑错误。
在函数 getFiles() 中,我删除了前导“.”从文件列表中删除,并没有像我应该的那样删除“./”。因此,文件名自然是“/2006/1_siječanj.xls”,而不是应有的“2006/1_siječanj.xls”。这是一个 IOError 而不是 UnicodeEncodeError。我的疏忽导致脚本试图找到绝对路径而不是相对路径。
好吧,这很尴尬。谢谢你们,希望这篇文章能帮助其他人更加关注 python 抛给我们的错误类型。
关于python - 在 python 脚本中使用 utf-8 文件名时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19210139/
我正在处理一组标记为 160 个组的 173k 点。我想通过合并最接近的(到 9 或 10 个组)来减少组/集群的数量。我搜索过 sklearn 或类似的库,但没有成功。 我猜它只是通过 knn 聚类
我有一个扁平数字列表,这些数字逻辑上以 3 为一组,其中每个三元组是 (number, __ignored, flag[0 or 1]),例如: [7,56,1, 8,0,0, 2,0,0, 6,1,
我正在使用 pipenv 来管理我的包。我想编写一个 python 脚本来调用另一个使用不同虚拟环境(VE)的 python 脚本。 如何运行使用 VE1 的 python 脚本 1 并调用另一个 p
假设我有一个文件 script.py 位于 path = "foo/bar/script.py"。我正在寻找一种在 Python 中通过函数 execute_script() 从我的主要 Python
这听起来像是谜语或笑话,但实际上我还没有找到这个问题的答案。 问题到底是什么? 我想运行 2 个脚本。在第一个脚本中,我调用另一个脚本,但我希望它们继续并行,而不是在两个单独的线程中。主要是我不希望第
我有一个带有 python 2.5.5 的软件。我想发送一个命令,该命令将在 python 2.7.5 中启动一个脚本,然后继续执行该脚本。 我试过用 #!python2.7.5 和http://re
我在 python 命令行(使用 python 2.7)中,并尝试运行 Python 脚本。我的操作系统是 Windows 7。我已将我的目录设置为包含我所有脚本的文件夹,使用: os.chdir("
剧透:部分解决(见最后)。 以下是使用 Python 嵌入的代码示例: #include int main(int argc, char** argv) { Py_SetPythonHome
假设我有以下列表,对应于及时的股票价格: prices = [1, 3, 7, 10, 9, 8, 5, 3, 6, 8, 12, 9, 6, 10, 13, 8, 4, 11] 我想确定以下总体上最
所以我试图在选择某个单选按钮时更改此框架的背景。 我的框架位于一个类中,并且单选按钮的功能位于该类之外。 (这样我就可以在所有其他框架上调用它们。) 问题是每当我选择单选按钮时都会出现以下错误: co
我正在尝试将字符串与 python 中的正则表达式进行比较,如下所示, #!/usr/bin/env python3 import re str1 = "Expecting property name
考虑以下原型(prototype) Boost.Python 模块,该模块从单独的 C++ 头文件中引入类“D”。 /* file: a/b.cpp */ BOOST_PYTHON_MODULE(c)
如何编写一个程序来“识别函数调用的行号?” python 检查模块提供了定位行号的选项,但是, def di(): return inspect.currentframe().f_back.f_l
我已经使用 macports 安装了 Python 2.7,并且由于我的 $PATH 变量,这就是我输入 $ python 时得到的变量。然而,virtualenv 默认使用 Python 2.6,除
我只想问如何加快 python 上的 re.search 速度。 我有一个很长的字符串行,长度为 176861(即带有一些符号的字母数字字符),我使用此函数测试了该行以进行研究: def getExe
list1= [u'%app%%General%%Council%', u'%people%', u'%people%%Regional%%Council%%Mandate%', u'%ppp%%Ge
这个问题在这里已经有了答案: Is it Pythonic to use list comprehensions for just side effects? (7 个答案) 关闭 4 个月前。 告
我想用 Python 将两个列表组合成一个列表,方法如下: a = [1,1,1,2,2,2,3,3,3,3] b= ["Sun", "is", "bright", "June","and" ,"Ju
我正在运行带有最新 Boost 发行版 (1.55.0) 的 Mac OS X 10.8.4 (Darwin 12.4.0)。我正在按照说明 here构建包含在我的发行版中的教程 Boost-Pyth
学习 Python,我正在尝试制作一个没有任何第 3 方库的网络抓取工具,这样过程对我来说并没有简化,而且我知道我在做什么。我浏览了一些在线资源,但所有这些都让我对某些事情感到困惑。 html 看起来
我是一名优秀的程序员,十分优秀!