gpt4 book ai didi

python - 如何从日记条目制作字典?

转载 作者:行者123 更新时间:2023-12-02 03:47:12 28 4
gpt4 key购买 nike

我正在尝试匹配日记条目日期的正则表达式,如果匹配,则将日期作为键,将后续条目作为值。

首先,我打算将其拆分为一个数组,并将每个奇数索引作为键,将其他索引作为值。

来源https://archive.org/stream/AnneFrankTheDiaryOfAYoungGirl_201606/Anne-Frank-The-Diary-Of-A-Young-Girl_djvu.txt

file = open(r"C:\Users\mmcgown\Desktop\School\MSDS452\FinalProject\TheDiaryOfAYoungGirl.txt","r")
s = file.read()

import re
r = '(SUNDAY|MONDAY|TUESDAY|WEDNESDAY|THURSDAY|FRIDAY|SATURDAY), (JANUARY|FEBRUARY|MARCH|APRIL|MAY|JUNE|JULY|AUGUST|SEPTEMBER|OCTOBER|NOVEMBER|DECEMBER) \d{1,2}, 19\d{2}\s*\n'
l = re.split(r,s)

l

但是,这只是在正则表达式之前和之后分开。因此,拆分不是正确的方法...因为由于某种原因它也在列表中给了我日期和月份。

'',
'SUNDAY',
'JUNE',
'I\'ll begin from the ...

像下面这样分割这些日记条目的最简单方法是什么?

{ 'SUNDAY, JUNE 14, 1942' : 'I'll begin from the ...' },
{ 'MONDAY, JUNE 15, 1942' : 'I had my birthday ...'},
etc.

附注我还尝试了 for line in file 方法,但它变得越来越难看,所以我想我应该寻求正确解决方案的输入(我没有完成下面的内容)。

file = open(r"C:\Users\mmcgown\Desktop\School\MSDS452\FinalProject\TheDiaryOfAYoungGirl.txt","r")
dia = {}
for line in file:
i = 0
if re.match(r,line) and i == 0:
dia = {line.rstrip() : ''}
elif not re.match(r,line):
line = last_line + line
elif re.match(r,line) and (i != 0):
dia.update({line: last_line})
i = i + 1
last_line = line

最佳答案

您可以使用此示例(我使用 OrderedDict 来按顺序保留字典中的日期,sample.txt 是您问题中的文本文件):

import re
from collections import OrderedDict

with open('sample.txt', 'r') as f_in:
data = f_in.read()

data = re.findall(r'^([A-Z]+, [A-Z]+ \d+, \d+)(.*?)(?=(?:[A-Z]+, [A-Z]+ \d+, \d+)|(?:ANNE\'S DIARY ENDS HERE\.))', data, flags=re.M|re.DOTALL)

d = OrderedDict( data )

from pprint import pprint
pprint(d)

打印:

OrderedDict([('SUNDAY, JUNE 14, 1942',
'\n'
'\n'
'\n'
"I'll begin from the moment I got you, the moment I saw you "
'lying on the table among\n'

...till

"what I'd like to be and what I could be if ... if only there "
'were no other people in\n'
'the world.\n'
'\n'
'Yours, Anne M. Frank\n'
'\n'
'\n')])

关于python - 如何从日记条目制作字典?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59229057/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com