gpt4 book ai didi

Python email quoted-printable编码问题

转载 作者:太空狗 更新时间:2023-10-30 02:23:26 38 4
gpt4 key购买 nike

我正在使用以下方法从 Gmail 中提取电子邮件:

def getMsgs():
try:
conn = imaplib.IMAP4_SSL("imap.gmail.com", 993)
except:
print 'Failed to connect'
print 'Is your internet connection working?'
sys.exit()
try:
conn.login(username, password)
except:
print 'Failed to login'
print 'Is the username and password correct?'
sys.exit()

conn.select('Inbox')
# typ, data = conn.search(None, '(UNSEEN SUBJECT "%s")' % subject)
typ, data = conn.search(None, '(SUBJECT "%s")' % subject)
for num in data[0].split():
typ, data = conn.fetch(num, '(RFC822)')
msg = email.message_from_string(data[0][1])
yield walkMsg(msg)

def walkMsg(msg):
for part in msg.walk():
if part.get_content_type() != "text/plain":
continue
return part.get_payload()

但是,我收到的一些电子邮件几乎不可能从编码相关字符(例如“=”)中提取日期(使用正则表达式),它们随机出现在各种文本字段的中间。这是一个发生在我要提取的日期范围内的示例:

Name: KIRSTI Email: kirsti@blah.blah Phone #: + 999 99995192 Total in party: 4 total, 0 children Arrival/Departure: Oct 9= , 2010 - Oct 13, 2010 - Oct 13, 2010

有没有办法去除这些编码字符?

最佳答案

您可以/应该使用 email.parser解码邮件消息的模块,例如(快速而肮脏的例子!):

from email.parser import FeedParser
f = FeedParser()
f.feed("<insert mail message here, including all headers>")
rootMessage = f.close()

# Now you can access the message and its submessages (if it's multipart)
print rootMessage.is_multipart()

# Or check for errors
print rootMessage.defects

# If it's a multipart message, you can get the first submessage and then its payload
# (i.e. content) like so:
rootMessage.get_payload(0).get_payload(decode=True)

使用 Message.get_payload 的“解码”参数,该模块会根据其编码自动解码内容(例如,您问题中引用的可打印文件)。

关于Python email quoted-printable编码问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4040074/

38 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com