Python 2.7 无法在使用 Regex re.findall 后使用 DictWriter 从 DictReader 中写出文件-6ren

Python 2.7 无法在使用 Regex re.findall 后使用 DictWriter 从 DictReader 中写出文件

转载作者：太空宇宙更新时间：2023-11-03 16:58:38

我已经尝试了许多基于出色的堆栈溢出想法的方法:
How to write header row with csv.DictWriter?
Writing a Python list of lists to a csv file
csv.DictWriter -- TypeError: __init__() takes at least 3 arguments (4 given)
Python: tuple indices must be integers, not str when selecting from mysql table
https://docs.python.org/2/library/csv.html
python csv write only certain fieldnames, not all
Python 2.6 文本处理和
Why is DictWriter not Writing all rows in my Dictreader instance?
我尝试映射读取器和写入器字段名以及特殊的 header 参数。
我从一些很棒的多列 SO 文章中构建了第二层测试:
代码如下

import csv
import re
t = re.compile('<\*(.*?)\*>')
headers = ['a', 'b', 'd', 'g']
with open('in2.csv', 'rb') as csvfile:
    with open('out2.csv', 'wb') as output_file:
        reader = csv.DictReader(csvfile)
        writer = csv.DictWriter(output_file, headers, extrasaction='ignore')
        writer.writeheader()
        print(headers)
        for row in reader:
            row['d'] = re.findall(t, row['d'])
            print(row['a'], row['b'], row['d'], row['g'])
            writer.writerow(row)

输入数据是:

a, b, c, d, e, f, g, h 

<* number 1 *>, <* number 2 *>, <* number 3 *>, <* number 4 *>, ...<* number 8 *> 

<* number 2 *>, <* number 3 *>, <* number 4 *>, ...<* number 8 *>, <* number 9 *>

输出数据为:

['a', 'b', 'd', 'g' ] 

('<* number 1 *>', '<* number 2 *>', ' number 4 ', <* number 7 *>) 

('<* number 2 *>', '<* number 3 *>', ' number 5 ', <* number 8 *>)

完全符合要求。
但是，当我使用包含空格、双引号和混合大小写字母的单词的粗略数据集时，打印工作在行级别，但写作并不完全有效。
总的来说，我已经能够(我知道我在这里处于史诗般的失败模式)实际写入一行具有挑战性的数据，但在那种情况下，一个标题和多行是不行的。我无法用我读过的所有有才华的文章来克服这个障碍，这真是太蹩脚了。
所有四列都因键错误或“TypeError:元组索引必须是整数，而不是 str”而失败
我显然不明白如何掌握 Python 需要什么来实现这一点。
高级是:读入具有七个观察值/列的文本文件。只用四栏写出来；在一列上执行正则表达式。确保写出每个新形成的行，而不是原始行。
我可能需要一种更友好的全局临时表类型来读取行，更新行，然后将行写入文件。
也许我对 Python 架构的要求太多，以协调一个 DictReader 和一个 DictWriter 来读取数据，过滤到四列，用正则表达式更新第四列，然后用更新的四个元组写出文件。
在这个时刻，我没有时间研究解析器。我想最终更详细地说，因为每个 Python 版本(现在是 2.7，以后是 3.x)解析器似乎很方便。
再次为这种方法的复杂性和我对 Python 基础的理解不足表示歉意。在 R 语言中，与我的缺点相似的是理解 S4 级别的编码，而不仅仅是 S3 级别。
这是更接近失败的数据，抱歉 - 我需要显示如何设置标题，如何使用单个双引号对进入的文件行进行格式化，并在整行周围加上引号以及日期的格式化方式，但是未引用:

    stuff_type|stuff_date|stuff_text
""cool stuff"|01-25-2015|""the text stuff <*to test*> to find a way to extract all text that is <*included in special tags*> less than star and greater than star"""
""cool stuff"|05-13-2014|""the text stuff <*to test a second*> to find a way to extract all text that is <*included in extra special tags*> less than star and greater than star"""
""great big stuff"|12-7-2014|"the text stuff <*to test a third*> to find a way to extract all text that is <*included in very special tags*> less than star and greater than star"""
""nice stuff"|2-22-2013|""the text stuff <*to test a fourth ,*> to find a way to extract all text that is <*included in doubly special tags*> less than star and greater than star"""

stuff_type,stuff_date,stuff_text
cool stuff,1/25/2015,the text stuff <*to test*> to find a way to extract all text that is <*included in special tags*> less than star and greater than star
cool stuff,5/13/2014,the text stuff <*to test a second*> to find a way to extract all text that is <*included in extra special tags*> less than star and greater than star
great big stuff,12/7/2014,the text stuff <*to test a third*> to find a way to extract all text that is <*included in very special tags*> less than star and greater than star
nice stuff,2/22/2013,the text stuff <*to test a fourth *> to find a way to extract all text that is <*included in really special tags*> less or greater than star

我打算重新测试一下，但是今天早上 Spyder 的更新让我的 Python 控制台崩溃了。呃。使用 vanilla Python，上面的测试数据因以下代码而失败……无需执行写入步骤……甚至无法在此处打印……可能需要方言中的 QUOTES.NONE。

import csv
import re 
t = re.compile('<\*(.*?)\*>')
headers = ['stuff_type', 'stuff_date', 'stuff_text']
with open('C:/Temp/in3.csv', 'rb') as csvfile:
    with open('C:/Temp/out3.csv', 'wb') as output_file:
        reader = csv.DictReader(csvfile)
        writer = csv.DictWriter(output_file, headers, extrasaction='ignore')
        writer.writeheader()
        print(headers)
        for row in reader:
            row['stuff_text'] = re.findall(t, row['stuff_text'])
            print(row['stuff_type'], row['stuff_date'], row['stuff_text'])
            writer.writerow(row)

错误:
无法通过此处的剪切工具图像....对不起
KeyError:'stuff_text'
好的:它可能在列的引用和分隔中:上面没有引号的数据在没有 KeyError 的情况下打印，现在可以正确写入文件:在使用正则表达式提取文本之前，我可能必须从引号字符中清理文件。任何想法将不胜感激。
好问题@Andrea Corbellini
如果我手动删除了引号，上面的代码会生成以下输出:

stuff_type,stuff_date,stuff_text
cool stuff,1/25/2015,"['to test', 'included in special tags']"
cool stuff,5/13/2014,"['to test a second', 'included in extra special tags']"
great big stuff,12/7/2014,"['to test a third', 'included in very special tags']"
nice stuff,2/22/2013,"['to test a fourth ', 'included in really special tags']"

这就是我想要的输出。所以，谢谢你的“懒惰”问题——我是懒惰的人，应该把第二个输出放在后面。
同样，在不删除多组引号的情况下，我有 KeyError:'stuff_type'。很抱歉，我试图从带有错误的 Python 屏幕截图中插入图像，但尚未弄清楚如何在 SO 中执行此操作。我使用了上面的图像部分，但这似乎指向一个可能上传到 SO 的文件？没有插入？
随着@monkut 在下面使用“。”的出色输入。加入事物或字面上的东西变得更好。

{['stuff_type', 'stuff_date', 'stuff_text']
('cool stuff', '1/25/2015', 'to test:included in special tags')
('cool stuff', '5/13/2014', 'to test a second:included in extra special tags')
('great big stuff', '12/7/2014', 'to test a third:included in very special tags')
('nice stuff', '2/22/2013', 'to test a fourth :included in really special tags')}
    
import csv
import re 
t = re.compile('<\*(.*?)\*>')
headers = ['stuff_type', 'stuff_date', 'stuff_text']
csv.register_dialect('piper', delimiter='|', quoting=csv.QUOTE_NONE)
with open('C:/Python/in3.txt', 'rb') as csvfile:
    with open('C:/Python/out5.csv', 'wb') as output_file:
        reader = csv.DictReader(csvfile, dialect='piper')
        writer = csv.DictWriter(output_file, headers, extrasaction='ignore')
        writer.writeheader()
        print(headers)
        for row in reader:
            row['stuff_text'] = ":".join(re.findall(t, row['stuff_text']))
            print(row['stuff_type'], row['stuff_date'], row['stuff_text'])
            writer.writerow(row)

错误路径如下:

runfile('C:/Python/test quotes with dialect quotes none or quotes filter and special characters with findall regex.py', wdir='C:/Python')
['stuff_type', 'stuff_date', 'stuff_text']
('""cool stuff"', '01-25-2015', 'to test')
Traceback (most recent call last):

  File "<ipython-input-3-832ce30e0de3>", line 1, in <module>
    runfile('C:/Python/test quotes with dialect quotes none or quotes filter and special characters with findall regex.py', wdir='C:/Python')

  File "C:\Users\Methody\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
    execfile(filename, namespace)

  File "C:\Users\Methody\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
    exec(compile(scripttext, filename, 'exec'), glob, loc)

  File "C:/Python/test quotes with dialect quotes none or quotes filter and special characters with findall regex.py", line 20, in <module>
    row['stuff_text'] = ":".join(re.findall(t, row['stuff_text']))

  File "C:\Users\Methody\Anaconda\lib\re.py", line 177, in findall
    return _compile(pattern, flags).findall(string)

TypeError: expected string or buffer

在处理正则表达式 findall 之前，我将找到一种更强大的方法来清理和删除引号。可能是行 = string.remove(带空格的引号)。

最佳答案

我认为 findall 返回一个列表，这可能会把事情搞砸，因为 dictwriter 想要一个字符串值。

row['d'] = re.findall(t, row['d'])

您可以使用 .join 将结果转换为单个字符串值:

row['d'] = ":".join(re.findall(t, row['d']))

其中，此处的值与“:”相连。但是，正如您所提到的，您可能需要更多地清理这些值......

您提到使用编译的正则表达式对象存在问题。
下面是如何使用已编译的正则表达式对象的示例:

import re
t = re.compile('<\*(.*?)\*>')
text= ('''cool stuff,1/25/2015,the text stuff <*to test*> to find a way to extract all text that'''
       ''' is <*included in special tags*> less than star and greater than star''')
result = t.findall(text)

这应该将以下内容返回到 result :

['to test', 'included in special tags']

关于Python 2.7 无法在使用 Regex re.findall 后使用 DictWriter 从 DictReader 中写出文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35194444/

文章推荐： ruby - 如何从 ruby 程序中确定控制台字符宽度？

文章推荐： c# - 如何等待任务执行结束来恢复程序的执行

文章推荐： python sqlite3多个LEFT JOIN按列分组

java - 无法 Autowiring
我通过 spring ioc 编写了一些 Rest 应用程序。但我无法解决这个问题。这是我的异常(exception): org.springframework.beans.factory.BeanC
java - 无法@Autowire配置
我对 TestNG、Spring 框架等完全陌生，我正在尝试使用注释 @Value通过 @Configuration 访问配置文件注释。我在这里想要实现的目标是让控制台从配置文件中写出“hi”，通过
无法 malloc 然后转到程序顶部
为此工作了几个小时。我完全被难住了。这是 CS113 的实验室。如果用户在程序(二进制计算器)结束时选择继续，我们需要使用 goto 语句来到达程序的顶部。但是，我们还需要释放所有分配的内存。
无法 avformat_open_input .mp3
我正在尝试使用 ffmpeg 库构建一个小的 C 程序。但是我什至无法使用 avformat_open_input() 打开音频文件设置检查错误代码的函数后，我得到以下输出: Error code:
java - 无法 Autowiring
使用 Spring Initializer 创建一个简单的 Spring boot。我只在可用选项下选择 DevTools。创建项目后，无需对其进行任何更改，即可正常运行程序。现在，当我尝试在项目
macos - 无法 brew 链接qt
所以我只是在 Mac OS X 中通过 brew 安装了 qt。但是它无法链接它。当我尝试运行 brew link qt 或 brew link --overwrite qt 我得到以下信息: ton
git - 无法 pull 或提交
我在提交和 pull 时遇到了问题:在提交的 IDE 中，我看到: warning not all local changes may be shown due to an error: unable
gcc - 无法 grep 特定格式的文本
我跑 man gcc | grep "-L" 我明白了 Usage: grep [OPTION]... PATTERN [FILE]... Try `grep --help' for more inf
curl - 无法 CURL 远程文件
我有一段代码，旨在接收任何 URL 并将其从网络上撕下来。到目前为止，它运行良好，直到有人给了它这个 URL: http://www.aspensurgical.com/static/images/a
WireGuard - 无法 ping 服务器或解析域
在过去的 5 个小时里，我一直在尝试在我的服务器上设置 WireGuard，但在完成所有设置后，我无法 ping IP 或解析域。下面是服务器配置 [Interface] Address = 10.
GitLab:无法 fork 我自己的项目
我正在尝试在 GitLab 中 fork 我的一个私有(private)项目，但是当我按下 fork 按钮时，我会收到以下信息: No available namespaces to fork the
javascript - 无法 GET/定义路由
我这里遇到了一些问题。我是 node.js 和 Rest API 的新手，但我正在尝试自学。我制作了 REST API，使用 MongoDB 与我的数据库进行通信，我使用 Postman 来测试我的路
javascript - 无法 AppendChild - 尝试使一个方法在不同的类中附加另一个方法
下面的代码在控制台中给出以下消息: Uncaught DOMException: Failed to execute 'appendChild' on 'Node': The new child el
javascript - 数组被视为对象，无法 NgFor
我正在尝试调用一个新端点来显示数据，我意识到在上一组有效的数据中，它在数据周围用一对额外的“[]”括号进行控制台，我认为这就是问题是，而新端点不会以我使用数据的方式产生它! 这是 NgFor 失败的原
git - 无法 checkout 到无效路径
我正在尝试将我的 Symfony2 应用程序部署到我的 Azure Web 应用程序，但遇到了一些麻烦。推送到远程时，我在终端中收到以下消息 remote: Updating branch 'mas
docker - Minikube具有IP-无法 curl
Minikube已启动并正在运行，没有任何错误，但是我无法 curl IP。我在这里遵循:https://docs.traefik.io/user-guide/kubernetes/，似乎没有提到关闭
linux - 无法 docker 组成任何项目
每当我尝试docker组成任何项目时，都会出现以下错误。我尝试过有和没有sudo 我在这台机器上只有这个问题。我可以在Mac和Amazon WorkSpace上运行相同的容器。 (myslabs)
python - 无法 pip 安装手电筒
我正在尝试 pip install stanza 并收到此消息: ERROR: No matching distribution found for torch>=1.3.0 (from stanza
kubernetes 无法 ping 通其他服务
DNS 解析看起来不错，但我无法 ping 我的服务。可能是什么原因？来自集群中的另一个 Pod: $ ping backend PING backend.default.svc.cluster.l
spring - 无法 Autowiring 字段
我正在使用Hibernate 4 + Spring MVC 4当我开始 Apache Tomcat Server 8我收到此错误: Error creating bean with name 'wel

太空宇宙

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

Python 2.7 无法在使用 Regex re.findall 后使用 DictWriter 从 DictReader 中写出文件