email - 从 mbox 文件中提取电子邮件正文，无论字符集和内容传输编码如何，将其解码为纯文本-6ren

email - 从 mbox 文件中提取电子邮件正文，无论字符集和内容传输编码如何，将其解码为纯文本

转载作者：行者123 更新时间：2023-12-03 23:25:05

24

4

我正在尝试使用 Python 3 从雷鸟 mbox 文件中提取电子邮件正文。它是一个 IMAP 帐户。

我希望将电子邮件正文的文本部分作为 unicode 字符串进行处理。它应该“看起来像”电子邮件在 Thunderbird 中所做的，并且不包含转义字符，例如\r\n =20 等。

我认为这是我不知道如何解码或删除的内容传输编码。
我收到包含各种不同内容类型和不同内容传输编码的电子邮件。
这是我目前的尝试:

import mailbox
import quopri,base64

def myconvert(encoded,ContentTransferEncoding):
    if ContentTransferEncoding == 'quoted-printable':
        result = quopri.decodestring(encoded)
    elif ContentTransferEncoding == 'base64':
        result = base64.b64decode(encoded)

mboxfile = 'C:/Users/Username/Documents/Thunderbird/Data/profile/ImapMail/server.name/INBOX'

for msg in mailbox.mbox(mboxfile):
    if msg.is_multipart():    #Walk through the parts of the email to find the text body.
        for part in msg.walk():
            if part.is_multipart(): # If part is multipart, walk through the subparts.
                for subpart in part.walk():
                    if subpart.get_content_type() == 'text/plain':
                        body = subpart.get_payload() # Get the subpart payload (i.e the message body)
                    for k,v in subpart.items():
                            if k == 'Content-Transfer-Encoding':
                                cte = v             # Keep the Content Transfer Encoding
            elif subpart.get_content_type() == 'text/plain':
                body = part.get_payload()           # part isn't multipart Get the payload
                for k,v in part.items():
                    if k == 'Content-Transfer-Encoding':
                        cte = v                      # Keep the Content Transfer Encoding

print(body)
print('Body is of type:',type(body))
body = myconvert(body,cte)
print(body)

但这失败了:

Body is of type: <class 'str'>
Traceback (most recent call last):
File "C:/Users/David/Documents/Python/test2.py", line 31, in <module>
  body = myconvert(body,cte)
File "C:/Users/David/Documents/Python/test2.py", line 6, in myconvert
  result = quopri.decodestring(encoded)
File "C:\Python32\lib\quopri.py", line 164, in decodestring
  return a2b_qp(s, header=header)
TypeError: 'str' does not support the buffer interface

最佳答案

这是一些完成这项工作的代码，它会打印错误而不是因为那些失败的消息而崩溃。我希望它可能有用。请注意，如果 Python 3 中存在错误并且已修复，则 .get_payload(decode=True) 行可能会返回 str 对象而不是 bytes 对象。我今天在 2.7.2 和 Python 3.2.1 上运行了这段代码。

import mailbox

def getcharsets(msg):
    charsets = set({})
    for c in msg.get_charsets():
        if c is not None:
            charsets.update([c])
    return charsets

def handleerror(errmsg, emailmsg,cs):
    print()
    print(errmsg)
    print("This error occurred while decoding with ",cs," charset.")
    print("These charsets were found in the one email.",getcharsets(emailmsg))
    print("This is the subject:",emailmsg['subject'])
    print("This is the sender:",emailmsg['From'])

def getbodyfromemail(msg):
    body = None
    #Walk through the parts of the email to find the text body.    
    if msg.is_multipart():    
        for part in msg.walk():

            # If part is multipart, walk through the subparts.            
            if part.is_multipart(): 

                for subpart in part.walk():
                    if subpart.get_content_type() == 'text/plain':
                        # Get the subpart payload (i.e the message body)
                        body = subpart.get_payload(decode=True) 
                        #charset = subpart.get_charset()

            # Part isn't multipart so get the email body
            elif part.get_content_type() == 'text/plain':
                body = part.get_payload(decode=True)
                #charset = part.get_charset()

    # If this isn't a multi-part message then get the payload (i.e the message body)
    elif msg.get_content_type() == 'text/plain':
        body = msg.get_payload(decode=True) 

   # No checking done to match the charset with the correct part. 
    for charset in getcharsets(msg):
        try:
            body = body.decode(charset)
        except UnicodeDecodeError:
            handleerror("UnicodeDecodeError: encountered.",msg,charset)
        except AttributeError:
             handleerror("AttributeError: encountered" ,msg,charset)
    return body    


#mboxfile = 'C:/Users/Username/Documents/Thunderbird/Data/profile/ImapMail/server.name/INBOX'
print(mboxfile)
for thisemail in mailbox.mbox(mboxfile):
    body = getbodyfromemail(thisemail)
    print(body[0:1000])

关于email - 从 mbox 文件中提取电子邮件正文，无论字符集和内容传输编码如何，将其解码为纯文本，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/7166922/

24

4

0

文章推荐： scala:在具有可选返回类型的树上查找方法

文章推荐： php - 如何在 PHP 中将数字修剪为 5 位数字？

文章推荐： scala - 如何在 Scala 中定义包私有(private) *trait*？

电子邮件:具有互斥值的合法重复电子邮件标题键
在电子邮件中 Received: header 可以合法地多次出现，并且具有互斥的值... Received: three.example.com Received: two.example.co
如果出现错误，SAS 电子邮件
是否有任何代码/宏可以合并到我的 sas 程序中，一旦我的 sas 代码在运行时发生错误，它会立即给我发送电子邮件？另外，这封电子邮件是否可能包含发生的错误？最佳答案是的……也不是…… 这是可能
HTML 电子邮件 - 使图像适合表格单元格
我有一个包含三个 td 的表格，每个表格都需要包含图像。 td 的宽度和高度是固定的，但图像大小可以变化。目标是在不扭曲单元格或图像本身的情况下拟合图像。不能使用 background-image 属
iphone - 如何从应用程序发送短信/电子邮件
首先非常感谢大家过去提出的宝贵建议，我们正在创建一个应用程序，在某些事件中想要将电子邮件/短信发送到我们已经尝试过 openURL 调用的指定电话号码，但它会打开现有的内置iPhone 的电子邮件/短
Java 电子邮件 - 异常服务器不受信任
我正在使用 apache commons mail 发送电子邮件。不幸的是，我遇到了以下异常: org.apache.commons.mail.EmailException: Sending the
mercurial - 为一个项目设置一个多变的用户名/电子邮件？
我可以在我的 ~/.hgrc 文件中设置我常用的电子邮件地址，但是有没有办法为一个 hg 项目指定我想被称为不同的名称/电子邮件(类似到项目目录中的 git 的 .git/config 文件覆盖 ~/
php - 电子邮件——在电子邮件中换行的正确方法是什么？
$message = 'New user registration\n\n There is a new submission on the site and below are the detail
带有图像的 php 电子邮件()
使用 outlook 我可以发送在邮件正文中插入图像的电子邮件(不是作为附件)。我如何使用 PHP 中的 mail() 函数来做到这一点？最佳答案我会推荐 Swift Mailer: http:/
VBA 电子邮件，正文中粘贴有图表和文本
以下代码的目标是将所选图表粘贴到我的文本下方的电子邮件正文中。但是，它继续将其粘贴在我的文本上方。我该如何更改它以使其粘贴在下面？谢谢! Set OutApp = CreateObject("Outl
Java 正则表达式电子邮件
首先，我知道不建议使用正则表达式发送电子邮件，但我必须对此进行测试。我有这个正则表达式: \b[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b 在 Java 中，我这样
Python 电子邮件？最简单的方法？
如何在没有任何第三方程序的情况下从 Python 发送电子邮件？最佳答案使用Python email和 smtplib模块。示例:http://docs.python.org/library/em
新页面上的 php 电子邮件
我目前正在使用此代码在 html 表中显示 mysql 记录 "; . . echo ' '. $row["Email1"] . ' '; . . echo ""; }
HTML 电子邮件 - 为链接的一部分着色
在电子邮件中使用 HTML 时，是否可以仅将链接的一部分着色为特定颜色？我试过: red part of link normal part ...我知道如果我拆分链接是可能的，但我正在努力将它们保持
html 电子邮件 - 将元素向下移动页面？
我正在处理一封 html 电子邮件，我有一个非常简单的元素 (ul)，我想将它移到页面下方。我检查了campaign monitor's guide并且不支持负边距，或者 position: abs
HTML 电子邮件 - 使用背景图片
我使用表格创建了我的 HTML 电子邮件，该表格有一个背景图像，在大多数基于 Web 的电子邮件客户端中都能正常显示。我正在努力让背景图片显示在 Outlook 中。我最近的尝试，我尝试了以下操作
php - 在发送之前格式化文本区域(电子邮件)
我对 PHP/CSS 和一般编程还很陌生。我想改变文本区域中文本的格式，就像在这里所做的那样，例如，当为突出显示的文本添加标签“代码示例”时，它会缩进它，或者当将它设置为粗体时，它会加粗它。这样做
C++ 电子邮件/SMTP
嘿，你能推荐我哪些 C++ 库或类可用于在 C++ 中通过 SMTP 发送电子邮件。我在 Windows 平台上。我需要一个支持附件和 SSL 连接的库。有哪些可用选项。我不打算实现我自己的 :) 问
HTML 电子邮件 - 按钮作为电子邮件中的表单
想知道是否可以在 HTML 电子邮件中包含一个表单。我要做的就是将图像输入提交到 Paypal 购买页面。我希望它直接进入 Paypal ，而无需先进入营销页面... 我会拥有 paypal 要求的完
HTML 电子邮件 - 不能限制宽度
我负责“ reshape ”我们的 IT 部门通信。我想用纯 HTML/CSS 来发送我们的电子邮件通知，以确保它的可移植性。下面是代码，它在 Outlook 中看起来完全符合我的要求，但是一旦将内
HTML 电子邮件，导航显示在移动设备的底部
我正在学习编写响应式电子邮件模板。目前我有:https://jsfiddle.net/q12yg2z6/

首页

博学

6Ren·AI

商城

email - 从 mbox 文件中提取电子邮件正文，无论字符集和内容传输编码如何，将其解码为纯文本