python - Unicode解码错误: 'utf-8' codec can't decode bytes in position 65534-65535: unexpected end of data-6ren

python - Unicode解码错误: 'utf-8' codec can't decode bytes in position 65534-65535: unexpected end of data

转载作者：太空狗更新时间：2023-10-29 17:47:31

26

4

我想用简单的 AES 加密来加密文件，这是我的 python3 源代码。

import os, random, struct
from Crypto.Cipher import AES

def encrypt_file(key, in_filename, out_filename=None, chunksize=64*1024):
    if not out_filename:
        out_filename = in_filename + '.enc'
    iv = os.urandom(16)
    encryptor = AES.new(key, AES.MODE_CBC, iv)
    filesize = os.path.getsize(in_filename)
    with open(in_filename, 'rb') as infile:
        with open(out_filename, 'wb') as outfile:
            outfile.write(struct.pack('<Q', filesize))
            outfile.write(iv)
            while True:
                chunk = infile.read(chunksize)
                if len(chunk) == 0:
                    break
                elif len(chunk) % 16 != 0:
                    chunk += ' ' * (16 - len(chunk) % 16)
                outfile.write(encryptor.encrypt(chunk.decode('UTF-8','strict')))

它对某些文件工作正常，遇到某些文件的错误信息，如下所示:

encrypt_file("qwertyqwertyqwer",'/tmp/test1' , out_filename=None, chunksize=64*1024)

没有错误信息，工作正常。

encrypt_file("qwertyqwertyqwer",'/tmp/test2' , out_filename=None, chunksize=64*1024)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 17, in encrypt_file
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 65534-65535: unexpected end of data

如何修复我的 encrypt_file 函数？

按照 t.m.adam 所说的去做，修复

outfile.write(encryptor.encrypt(chunk.decode('UTF-8','strict')))

作为

outfile.write(encryptor.encrypt(chunk))

尝试一些文件。

encrypt_file("qwertyqwertyqwer",'/tmp/test' , out_filename=None, chunksize=64*1024)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 16, in encrypt_file
TypeError: can't concat bytes to str

最佳答案

您的代码的主要问题是您使用的是字符串。 AES 适用于二进制数据，如果您使用的是 PyCryptodome，则此代码会引发 TypeError:

Object type <class 'str'> cannot be passed to C code

Pycrypto 接受字符串，但在内部将它们编码为字节，因此将字节解码为字符串没有意义，因为它将被编码回字节。此外，它使用 ASCII 编码(使用 PyCrypto v2.6.1、Python v2.7 测试)，因此，例如此代码:

encryptor.encrypt(u'ψ' * 16)

会引发 UnicodeEncodeError:

File "C:\Python27\lib\site-packages\Crypto\Cipher\blockalgo.py", line 244, in encrypt
    return self._cipher.encrypt(plaintext)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-15

加密或解密数据时应始终使用字节。然后你可以将明文解码为字符串，如果它是文本的话。

下一个问题是您的填充方法。它会生成一个字符串，因此当您尝试将它应用于明文(应该是字节)时会出现 TypeError。如果你用字节填充，你可以解决这个问题，

chunk += <b>b' '</b> * (16 - len(chunk) % 16)

但最好使用 PKCS7 填充(当前您使用的是零填充，但使用空格而不是零字节)。

PyCryptodome 提供了填充函数，但您似乎正在使用 PyCrypto。在这种情况下，您可以实现 PKCS7 填充，或者更好的是复制 PyCryptodome 的填充函数。

try:
    from Crypto.Util.Padding import pad, unpad
except ImportError:
    from Crypto.Util.py3compat import bchr, bord

    def pad(data_to_pad, block_size):
        padding_len = block_size-len(data_to_pad)%block_size
        padding = bchr(padding_len)*padding_len
        return data_to_pad + padding

    def unpad(padded_data, block_size):
        pdata_len = len(padded_data)
        if pdata_len % block_size:
            raise ValueError("Input data is not padded")
        padding_len = bord(padded_data[-1])
        if padding_len<1 or padding_len>min(block_size, pdata_len):
            raise ValueError("Padding is incorrect.")
        if padded_data[-padding_len:]!=bchr(padding_len)*padding_len:
            raise ValueError("PKCS#7 padding is incorrect.")
        return padded_data[:-padding_len]

pad 和unpad 函数是从Crypto.Util.Padding 复制的并修改为仅使用 PKCS7 填充。请注意，使用 PKCS7 填充时，填充最后一个 block 很重要，即使它的大小是 block 大小的倍数，否则您将无法正确取消填充。

将这些更改应用于 encrypt_file 函数，

def encrypt_file(key, in_filename, out_filename=None, chunksize=64*1024):
    if not out_filename:
        out_filename = in_filename + '.enc'
    iv = os.urandom(16)
    encryptor = AES.new(key, AES.MODE_CBC, iv)
    filesize = os.path.getsize(in_filename)
    with open(in_filename, 'rb') as infile:
        with open(out_filename, 'wb') as outfile:
            outfile.write(struct.pack('<Q', filesize))
            outfile.write(iv)
            pos = 0
            while pos < filesize:
                chunk = infile.read(chunksize)
                pos += len(chunk)
                if pos == filesize:
                    chunk = pad(chunk, AES.block_size)
                outfile.write(encryptor.encrypt(chunk))

和匹配的decrypt_file函数，

def decrypt_file(key, in_filename, out_filename=None, chunksize=64*1024):
    if not out_filename:
        out_filename = in_filename + '.dec'
    with open(in_filename, 'rb') as infile:
        filesize = struct.unpack('<Q', infile.read(8))[0]
        iv = infile.read(16)
        encryptor = AES.new(key, AES.MODE_CBC, iv)
        with open(out_filename, 'wb') as outfile:
            encrypted_filesize = os.path.getsize(in_filename)
            pos = 8 + 16 # the filesize and IV.
            while pos < encrypted_filesize:
                chunk = infile.read(chunksize)
                pos += len(chunk)
                chunk = encryptor.decrypt(chunk)
                if pos == encrypted_filesize:
                    chunk = unpad(chunk, AES.block_size)
                outfile.write(chunk)

这段代码与 Python2/Python3 兼容，它应该可以与 PyCryptodome 或 PyCrypto 一起使用。

但是，如果您使用的是 PyCrypto，我建议您更新到 PyCryptodome。 PyCryptodome 是 PyCrypto 的一个分支，它公开了相同的 API(因此您不必过多更改代码)，以及一些额外的功能:填充函数、经过验证的加密算法、KDF 等。另一方面，PyCrypto 不是不再维护，并且某些版本存在基于堆的缓冲区溢出漏洞:CVE-2013-7459 .

关于python - Unicode解码错误: 'utf-8' codec can't decode bytes in position 65534-65535: unexpected end of data，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53531307/

26

4

0

文章推荐： c# - 我可以将二进制文件放入标准输入吗？ C#

文章推荐： c# - Firefox C# 包装器/控件

groovy - geb 使用葡萄 - 下载失败 : commons-codec#commons-codec;1. 6!commons-codec.jar
我正在尝试使用 user guide 中的抓取示例运行 geb用于引入依赖项: $ cat my.groovy @Grapes([ @Grab("org.gebish:geb-core:0.9
codec - 如何在我的 Java 应用程序中使用 Opus-Codec
我阅读了很多关于 opus-codec 的内容，但我不明白如何在我的示例 Java 应用程序中使用它。是否有任何可用于 opus 的 .so 文件可以使用？如果没有，那么如何？最佳答案目前(在撰
maven - 访问被拒绝 : http://repo. maven.apache.org/maven2/commons-codec/commons-codec/1.4/commons-codec-1.4.pom，ReasonPhrase:Forbidden
我试图构建 Maven 项目，每当我在命令行上运行“mvn clean install”时，都会出现以下错误: 无法解析项目 com.my_project:jar:0.0.1-SNAPSHOT 的依
java.lang.NoClassDefFoundError : org/apache/lucene/codecs/Codec 错误
我有一个项目需要 Lucene(4.3.0) 并添加以下依赖项:lucene-core,lucene-analyzers-common,lucene-queries,lucene-queryparse
unit-testing - Grails 单元测试 : Json-Codec missing/How to mock Json-Codec?
我正在对 Controller 进行单元测试，目前我被服务(由 Controller 调用)中的“encodeAsJSON()”方法调用所困扰。我得到了 MissingMethodException
java - 是什么导致错误 ' A SPI class of type lucene.codecs.Codec name ' Lucene42'
无法弄清楚是什么原因导致 ' 名称为“Lucene42”的 org.apache.lucene.codecs.Codec 类型的 SPI 类不存在。您需要将支持此 SPI 的相应 JAR 文件添加到您
mongodb - org.bson.codecs.configuration.CodecConfigurationException : Can't find a codec for class [Ljava. lang.String;
我想运行以下命令来使用 MongoDB Java 驱动程序创建用户， client = new MongoClient(mongoClientURI); MongoDatabase d
java - 名称为 'Lucene54' 的 org.apache.lucene.codecs.Codec 类型的 SPI 类不存在
对于 lucene-core-5.5.2，我在 weblogic 服务器中遇到了问题 a。独立的搜索应用程序可以工作，但是当我部署为 WEB APP 时，它失败并出现以下错误 Exception ty
java - MongoDB jodatime : org. bson.codecs.configuration.CodecConfigurationException : Can't find a codec for class org. joda.time.DateTime
我的代码: DateTime dateTime = new DateTime(); BasicDBObject oldDoc = new BasicDBObject("email",email); B
java - org.bson.codecs.configuration.CodecConfigurationException : Can't find a codec for class org. hibernate.ogm.datastore.mongodb.type.GridFS
我正在尝试在 Hibernate-ogm 中尝试 GridFS。这就是我的课 import org.hibernate.ogm.datastore.mongodb.type.GridFS; @Embe
mongodb - org.bson.codecs.configuration.CodecConfigurationException : Can't find a codec for class org. springframework.data.mongodb.core.query.GeoCommand
我正在使用如下聚合: final List aggregations = new ArrayList<>(); Polygon polygon = new Polygon(new Po
java - 异常 : java. lang.IllegalArgumentException : An SPI class of type org. 名称为 'Lucene410' 的 apache.lucene.codecs.Codec 不存在
我正在处理一个多模块 gradle 项目(12 个模块)。我继承了该项目，需要更新其中使用的一些库的版本。我无法理解此错误的原因: ... 67 more Caused by: java.l
java - MongoDB Java 插入抛出 org.bson.codecs.configuration.CodecConfigurationException : Can't find a codec for class io. github.ilkgunel.mongodb.Pojo
我正在使用 Java 学习 MongoDB。我正在尝试使用 Java 驱动程序将数据插入 MongoDB。我正在像 MongoDB 教程中一样进行插入，而且一切都很好。但是如果我想插入一个变量，当我运
python - codecs.ascii_decode(输入，self.errors)[0] UnicodeDecodeError : 'ascii' codec can't decode byte 0xc2 in position 318: ordinal not in range(128)
我正在尝试打开并读取包含大量文本的 .txt 文件。下面是我的代码，我不知道如何解决这个问题。任何帮助将不胜感激。 file = input("Please enter a .txt file: ")
python - pip install django-toolbelt 报错:"codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: ' ascii' codec can't decode byte 0xc2
我使用 Arch Linux 和默认的 Python 3。我使用 Konsole 通过命令 pip install django-toolbelt 下载 django-toolbelt。名称: pip
codec - FFMPEG中帧和包的区别
我正在尝试使用 LibAV 解码 mpeg 视频文件。有两个术语我无法正确理解，镜框和数据包 . 按照我目前的理解，镜框是未压缩的视频帧和数据包是压缩帧。问题 : 数据包有多个帧，对吗？一
codec - DICOM 像素数据压缩解压能否搞乱窗口中心和窗口宽度
我正在查看计算机断层扫描 (CT) DICOM 图像。这些最初是未压缩的 DICOM 图像。我有这些 DICOM 图像的无损 J2K 压缩形式:传输语法 = 1.2.840.10008.1.2.4.9
java - 如何安装Commons Codec？
如何安装通用编解码器？我已经下载了，但是我在网上搜索过，找不到这个问题的答案。我想使用 Base64 编码器和解码器。还有 1 个问题，如果我的代码使用这个编解码器，其他尝试使用我的程序的用户是否也
loci.formats.codec.ZlibCodec类的使用及代码示例
本文整理了Java中loci.formats.codec.ZlibCodec类的一些代码示例，展示了ZlibCodec类的具体用法。这些代码示例主要来源于Github/Stackoverflow/Ma
hivemall.utils.codec.ZigZagLEB128Codec类的使用及代码示例
本文整理了Java中hivemall.utils.codec.ZigZagLEB128Codec类的一些代码示例，展示了ZigZagLEB128Codec类的具体用法。这些代码示例主要来源于Githu

首页

博学

6Ren·AI

商城

python - Unicode解码错误: 'utf-8' codec can't decode bytes in position 65534-65535: unexpected end of data