gpt4 book ai didi

python - 如何将此 .txt 转换为数据框?

转载 作者:太空宇宙 更新时间:2023-11-04 04:34:53 25 4
gpt4 key购买 nike

我正在尝试用 Python 进行 Whatsapp 分析,我想将其转换为包含日期、时间、人员和消息列的数据框。

 '[8/23/17, 1:45:10 AM] Guillermina: Guten Morgen',
'[8/23/17, 1:47:05 AM] Kester Stieldorf: Good morning :) was in Düsseldorf one hour ago ;)',
'[8/23/17, 1:47:16 AM] Guillermina: Hahahaha',
'[8/23/17, 1:47:19 AM] Guillermina: What?',
'[8/23/17, 1:47:36 AM] Kester Stieldorf: Yeah had to pick something up',

文本比那个长。我已经尝试过:

pieces = [x.strip('\n') for x in file_read.split('\n')]
beg_pattern = r'\d+/\d+/\d+,\s+\d+:\d+\s+\w+\.\w+\.'
pattern = r'\d+/(\d+/\d+),\s+\d+:\d+\s+\w+\.\w+\.\s+-\s+(\w+|\w+\s+\w+|\w+\s+\w+\s+\w+|\w+\s+\w+\.\s+\w+|\w+\s+\w+-\w+|\w+\'\w+\s+\w+|\+\d+\s+\(\W+\d+\)\s+\d+-\d+\W+|\W+\+\d+\s+\d+\s+\d+\s+\d+\W+|\W+\+\d+\s+\d+\w+\W+):(.*)'

reg = re.compile(beg_pattern)
regex = re.compile(pattern)

remove_blanks = [x for x in pieces if reg.match(x)]
blanks = [x for x in pieces if not reg.match(x)]

grouped_data = []
for x in remove_blanks:
grouped_data.extend(regex.findall(x))

grouped_data_list = [list(x) for x in grouped_data]

但它看起来并不奏效。我很确定 re.compile() 有问题,因为当我打印 reg 和 regex 时,它们返回空数组。我该如何解决这个问题?

最佳答案

首先,解析你的文件:

with open('file.txt') as f:
pieces = [i.strip() for i in f.read().splitlines()]

然后使用re.findall:

pd.DataFrame(
re.findall(r'\[(.*?)\]\s*([^:]+):\s*(.*)', '\n'.join(pieces)),
columns=['Time', 'Name', 'Text']
)

                  Time              Name  \
0 8/23/17, 1:45:10 AM Guillermina
1 8/23/17, 1:47:05 AM Kester Stieldorf
2 8/23/17, 1:47:16 AM Guillermina
3 8/23/17, 1:47:19 AM Guillermina
4 8/23/17, 1:47:36 AM Kester Stieldorf

Text
0 Guten Morgen
1 Good morning :) was in Düsseldorf one hour ago ;)
2 Hahahaha
3 What?
4 Yeah had to pick something up

关于python - 如何将此 .txt 转换为数据框?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51956961/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com