gpt4 book ai didi

Python 解析 - 一个文本文件中的多封电子邮件

转载 作者:行者123 更新时间:2023-12-01 06:07:50 25 4
gpt4 key购买 nike

我收到来自多个发件人的类似电子邮件,并使用下面的正则表达式 mn 提取所需的字符串。那部分工作正常。

正则表达式 o 然而让我感到困惑。我正在阅读的文本文件是 9 封电子邮件的组合,保存到一个文本文件中,并在 Python 中作为字符串打开。原始发件人(正则表达式 o)出现在文件中每条新消息的开头(9 次)

我想在找到的每个 CUSIP 和名称之后写入相同的原始发件人,直到匹配到不同的原始发件人。

我正在使用 xlwt3 和 wincom32。

来自文本文件的示例以及组合电子邮件,这是非常标准的:

--- Original Sender: TOM MADEUPNAME, SOME BANK, N. ---
----- Original Message -----
From: TOM MADEUPNAME (SOME BANK, N.)
To: BOB THISISMYEMAIL (XYZ INVESTMENTS, INC)
At: 8/31 8:53:25
**Offerings**

Mezz ReRemics
Cusip Description Original Current Cashflow Collat Offering
05531UAB6 BCAP 2009-RR5 1A2 18,745 18,745 Snr Sup Fxd 45.000

Prime/Alt-A Fixed
Cusip Description Original Current Cashflow Collat Offering
059487AE8 BOAA 2006-6 CB5 25,940 14,350 Seq Fxd 83.000
12544XAX3 CWHL 2007-9 A13 10,190 10,190 Ssnr Nas Fxd 92.500
17312XAJ3 CMSI 2007-4 1A9 2,871 2,741 Spr Snr Fxd 86.000

--- Original Sender: JOE MADEUPNAME, EUROPEAN BANK SECURI ---
----- Original Message -----
From: JOE MADEUPNAME (EUROPEAN BANK SECURI)
To: BOB THISISMYEMAIL (XYZ INVESTMENTS, INC)
At: 8/31 8:20:16

8-31-2011

Alt-A Fixed
Bond O/F C/F Cpn FICO CAL WALB 60+ Notes Offer
CSMC 06-9 7A1 25.00 11.97 L+45 728 26 578 35.21 FLT,AS,0.0% 50-00
LXS 07-10H 2A1 68.26 34.01 L+16 744 6 125 33.98 SS,9.57% 42-00
CSMC 06-7 9A1 15.00 7.81 L+30 688 5 198 46.46 SS,0.0% 29-16

Prime Hybrid
Bond O/F C/F Cpn FICO CAL WALB 60+ Notes Offer
SARM 05-18 6A1 14.56 6.01 2.58 730 46 432 15.87 SEA,SS,5/1,12.3% 78-00

Alt-A Hybrid
Bond O/F C/F Cpn FICO CAL WALB 60+ Notes Offer
ARMT 05-12 2A1 23.78 10.71 3.07 712 48 556 35.32 SS,5/1,4.9% *SOLD

Option Arm
Bond O/F C/F Cpn FICO CAL WALB 60+ Notes Offer
DBALT 07-OA4 1A1B 10.00 7.25 L+13 716 63 562 47.17 SS,OC,42.2% 64-16
--------------------------------------------------------------------------------------

已更新 - 工作

count_cusip = 0
count_name = 0
count_sender = 0
cur_sender = ''
for line in lines:

o = re.search(r"Original Sender:\s\b\w+\s\w+", line)
if o:
count_sender += 1
ws.write(count_sender,2,o.group(0))
ws.write(count_sender,2,cur_sender)
cur_sender = o.group(0)

m = re.search('[0-9]{3}[a-zA-Z0-9]{6}', line)
if m:
count_cusip += 1
ws.write(count_cusip,0,m.group(0))
ws.write(count_cusip,2,cur_sender)

n = re.search('[A-Z]{3,5}\s[0-9]{1,4}\D{1,3}\S{1,3}\s{1,2}\w+', line)
if n:
count_name += 1
ws.write(count_name,1,n.group(0))
ws.write(count_cusip,2,cur_sender)

o = re.search(r"Original Sender:\s\b\w+\s\w+", line)
if o:
cur_sender = o.group(0)

ws.write(count_name,2,cur_sender)

更新输出 - 根据需要。

CUSIP   Bond Name           Original Sender
00442PAD2 ACE 2006-OP1 A2B Original Sender: Nick Madeupname
12557YAE7 ARMT 05-12 2A1 Original Sender: Bobby Madeupname
39153VAT1 CSMC 06-9 7A1 Original Sender: Bobby Madeupname
05377RAE4 LXS 07-10H 2A1 Original Sender: Jane Madeupname
02005HAF0 CSMC 06-7 9A1 Original Sender: Jane Madeupname

最佳答案

您的问题并不完全清楚,因为您没有显示输出示例,但这是一个有根据的猜测:

count_cusip = 0
count_name = 0
count_sender = 0
cur_sender = ''
for line in lines:

m = re.search('[0-9]{3}[a-zA-Z0-9]{6}', line)
if m:
count_cusip += 1
ws.write(count_cusip,0,m.group(0))
ws.write(count_cusip,2,cur_sender)

n = re.search('[A-Z]{3,5}\s[0-9]{1,4}\D{1,3}\S{1,3}\s{1,2}\w+', line)
if n:
count_name += 1
ws.write(count_name,1,n.group(0))
ws.write(count_name,2,cur_sender)

o = re.search(r"Original Sender:\s\b\w+\s\w+", line)
if o:
count_sender += 1
cur_sender = o.group(0)

您需要保存它并写入每个 cusip 和名称的当前值,而不是在遇到时写入原始发件人。

关于Python 解析 - 一个文本文件中的多封电子邮件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7263042/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com