gpt4 book ai didi

Python 2.7 无法在使用 Regex re.findall 后使用 DictWriter 从 DictReader 中写出文件

转载 作者:太空宇宙 更新时间:2023-11-03 16:58:38 25 4
gpt4 key购买 nike

我已经尝试了许多基于出色的堆栈溢出想法的方法:
How to write header row with csv.DictWriter?
Writing a Python list of lists to a csv file
csv.DictWriter -- TypeError: __init__() takes at least 3 arguments (4 given)
Python: tuple indices must be integers, not str when selecting from mysql table
https://docs.python.org/2/library/csv.html
python csv write only certain fieldnames, not all
Python 2.6 文本处理和
Why is DictWriter not Writing all rows in my Dictreader instance?
我尝试映射读取器和写入器字段名以及特殊的 header 参数。
我从一些很棒的多列 SO 文章中构建了第二层测试:
代码如下

import csv
import re
t = re.compile('<\*(.*?)\*>')
headers = ['a', 'b', 'd', 'g']
with open('in2.csv', 'rb') as csvfile:
with open('out2.csv', 'wb') as output_file:
reader = csv.DictReader(csvfile)
writer = csv.DictWriter(output_file, headers, extrasaction='ignore')
writer.writeheader()
print(headers)
for row in reader:
row['d'] = re.findall(t, row['d'])
print(row['a'], row['b'], row['d'], row['g'])
writer.writerow(row)
输入数据是:
a, b, c, d, e, f, g, h 

<* number 1 *>, <* number 2 *>, <* number 3 *>, <* number 4 *>, ...<* number 8 *>

<* number 2 *>, <* number 3 *>, <* number 4 *>, ...<* number 8 *>, <* number 9 *>
输出数据为:
['a', 'b', 'd', 'g' ] 

('<* number 1 *>', '<* number 2 *>', ' number 4 ', <* number 7 *>)

('<* number 2 *>', '<* number 3 *>', ' number 5 ', <* number 8 *>)
完全符合要求。
但是,当我使用包含空格、双引号和混合大小写字母的单词的粗略数据集时,打印工作在行级别,但写作并不完全有效。
总的来说,我已经能够(我知道我在这里处于史诗般的失败模式)实际写入一行具有挑战性的数据,但在那种情况下,一个标题和多行是不行的。我无法用我读过的所有有才华的文章来克服这个障碍,这真是太蹩脚了。
所有四列都因键错误或“TypeError:元组索引必须是整数,而不是 str”而失败
我显然不明白如何掌握 Python 需要什么来实现这一点。
高级是:读入具有七个观察值/列的文本文件。只用四栏写出来;在一列上执行正则表达式。确保写出每个新形成的行,而不是原始行。
我可能需要一种更友好的全局临时表类型来读取行,更新行,然后将行写入文件。
也许我对 Python 架构的要求太多,以协调一个 DictReader 和一个 DictWriter 来读取数据,过滤到四列,用正则表达式更新第四列,然后用更新的四个元组写出文件。
在这个时刻,我没有时间研究解析器。我想最终更详细地说,因为每个 Python 版本(现在是 2.7,以后是 3.x)解析器似乎很方便。
再次为这种方法的复杂性和我对 Python 基础的理解不足表示歉意。在 R 语言中,与我的缺点相似的是理解 S4 级别的编码,而不仅仅是 S3 级别。
这是更接近失败的数据,抱歉 - 我需要显示如何设置标题,如何使用单个双引号对进入的文件行进行格式化,并在整行周围加上引号以及日期的格式化方式,但是未引用:
    stuff_type|stuff_date|stuff_text
""cool stuff"|01-25-2015|""the text stuff <*to test*> to find a way to extract all text that is <*included in special tags*> less than star and greater than star"""
""cool stuff"|05-13-2014|""the text stuff <*to test a second*> to find a way to extract all text that is <*included in extra special tags*> less than star and greater than star"""
""great big stuff"|12-7-2014|"the text stuff <*to test a third*> to find a way to extract all text that is <*included in very special tags*> less than star and greater than star"""
""nice stuff"|2-22-2013|""the text stuff <*to test a fourth ,*> to find a way to extract all text that is <*included in doubly special tags*> less than star and greater than star"""

stuff_type,stuff_date,stuff_text
cool stuff,1/25/2015,the text stuff <*to test*> to find a way to extract all text that is <*included in special tags*> less than star and greater than star
cool stuff,5/13/2014,the text stuff <*to test a second*> to find a way to extract all text that is <*included in extra special tags*> less than star and greater than star
great big stuff,12/7/2014,the text stuff <*to test a third*> to find a way to extract all text that is <*included in very special tags*> less than star and greater than star
nice stuff,2/22/2013,the text stuff <*to test a fourth *> to find a way to extract all text that is <*included in really special tags*> less or greater than star
我打算重新测试一下,但是今天早上 Spyder 的更新让我的 Python 控制台崩溃了。呃。使用 vanilla Python,上面的测试数据因以下代码而失败……无需执行写入步骤……甚至无法在此处打印……可能需要方言中的 QUOTES.NONE。
import csv
import re
t = re.compile('<\*(.*?)\*>')
headers = ['stuff_type', 'stuff_date', 'stuff_text']
with open('C:/Temp/in3.csv', 'rb') as csvfile:
with open('C:/Temp/out3.csv', 'wb') as output_file:
reader = csv.DictReader(csvfile)
writer = csv.DictWriter(output_file, headers, extrasaction='ignore')
writer.writeheader()
print(headers)
for row in reader:
row['stuff_text'] = re.findall(t, row['stuff_text'])
print(row['stuff_type'], row['stuff_date'], row['stuff_text'])
writer.writerow(row)
错误:
无法通过此处的剪切工具图像....对不起
KeyError:'stuff_text'
好的:它可能在列的引用和分隔中:上面没有引号的数据在没有 KeyError 的情况下打印,现在可以正确写入文件:在使用正则表达式提取文本之前,我可能必须从引号字符中清理文件。任何想法将不胜感激。
好问题@Andrea Corbellini
如果我手动删除了引号,上面的代码会生成以下输出:
stuff_type,stuff_date,stuff_text
cool stuff,1/25/2015,"['to test', 'included in special tags']"
cool stuff,5/13/2014,"['to test a second', 'included in extra special tags']"
great big stuff,12/7/2014,"['to test a third', 'included in very special tags']"
nice stuff,2/22/2013,"['to test a fourth ', 'included in really special tags']"
这就是我想要的输出。所以,谢谢你的“懒惰”问题——我是懒惰的人,应该把第二个输出放在后面。
同样,在不删除多组引号的情况下,我有 KeyError:'stuff_type'。很抱歉,我试图从带有错误的 Python 屏幕截图中插入图像,但尚未弄清楚如何在 SO 中执行此操作。我使用了上面的图像部分,但这似乎指向一个可能上传到 SO 的文件?没有插入?
随着@monkut 在下面使用“。”的出色输入。加入事物或字面上的东西变得更好。
{['stuff_type', 'stuff_date', 'stuff_text']
('cool stuff', '1/25/2015', 'to test:included in special tags')
('cool stuff', '5/13/2014', 'to test a second:included in extra special tags')
('great big stuff', '12/7/2014', 'to test a third:included in very special tags')
('nice stuff', '2/22/2013', 'to test a fourth :included in really special tags')}

import csv
import re
t = re.compile('<\*(.*?)\*>')
headers = ['stuff_type', 'stuff_date', 'stuff_text']
csv.register_dialect('piper', delimiter='|', quoting=csv.QUOTE_NONE)
with open('C:/Python/in3.txt', 'rb') as csvfile:
with open('C:/Python/out5.csv', 'wb') as output_file:
reader = csv.DictReader(csvfile, dialect='piper')
writer = csv.DictWriter(output_file, headers, extrasaction='ignore')
writer.writeheader()
print(headers)
for row in reader:
row['stuff_text'] = ":".join(re.findall(t, row['stuff_text']))
print(row['stuff_type'], row['stuff_date'], row['stuff_text'])
writer.writerow(row)
错误路径如下:
runfile('C:/Python/test quotes with dialect quotes none or quotes filter and special characters with findall regex.py', wdir='C:/Python')
['stuff_type', 'stuff_date', 'stuff_text']
('""cool stuff"', '01-25-2015', 'to test')
Traceback (most recent call last):

File "<ipython-input-3-832ce30e0de3>", line 1, in <module>
runfile('C:/Python/test quotes with dialect quotes none or quotes filter and special characters with findall regex.py', wdir='C:/Python')

File "C:\Users\Methody\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
execfile(filename, namespace)

File "C:\Users\Methody\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)

File "C:/Python/test quotes with dialect quotes none or quotes filter and special characters with findall regex.py", line 20, in <module>
row['stuff_text'] = ":".join(re.findall(t, row['stuff_text']))

File "C:\Users\Methody\Anaconda\lib\re.py", line 177, in findall
return _compile(pattern, flags).findall(string)

TypeError: expected string or buffer
在处理正则表达式 findall 之前,我将找到一种更强大的方法来清理和删除引号。可能是行 = string.remove(带空格的引号)。

最佳答案

我认为 findall 返回一个列表,这可能会把事情搞砸,因为 dictwriter 想要一个字符串值。

row['d'] = re.findall(t, row['d'])

您可以使用 .join 将结果转换为单个字符串值:
row['d'] = ":".join(re.findall(t, row['d']))

其中,此处的值与“:”相连。但是,正如您所提到的,您可能需要更多地清理这些值......

您提到使用编译的正则表达式对象存在问题。
下面是如何使用已编译的正则表达式对象的示例:
import re
t = re.compile('<\*(.*?)\*>')
text= ('''cool stuff,1/25/2015,the text stuff <*to test*> to find a way to extract all text that'''
''' is <*included in special tags*> less than star and greater than star''')
result = t.findall(text)

这应该将以下内容返回到 result :

['to test', 'included in special tags']

关于Python 2.7 无法在使用 Regex re.findall 后使用 DictWriter 从 DictReader 中写出文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35194444/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com