python - 如何从 MZ 可执行文件中提取 Unicode 字符序列？-6ren

python - 如何从 MZ 可执行文件中提取 Unicode 字符序列？

转载作者：行者123 更新时间：2023-11-28 16:48:49

25

4

我想从二进制(“.exe”)文件中获取 Unicode 字符串。

当我使用这样的代码时:

    `unicode_str = re.compile( u'[\u0020-\u007e]{1,}',re.UNICODE )`

它有效，但它只返回分隔的符号，所以当我尝试将量词更改为 3 时:

python : unicode_str = re.compile( u'[\u0020-\u007e]{3,}',re.UNICODE )

珀尔: 我的@a = ( $file =~/[\x{0020}-\x{007e}]{3,}/gs );

我只得到 ASCII 符号，所有 Unicode 符号都消失了。

我哪里弄错了，或者我对 Unicode 不了解？

评论中的代码:

python :

File = open( sys.argv[1], "rb" )
FileData = File.read()
File.close()
unicode_str = re.compile( u'[\u0020-\u007e]{3,}',re.UNICODE )
myList = unicode_str.findall(FileData)
for p in myList:
    print p

Perl:

$/ = "newline separator";
my $input = shift;
open( File, $input );
my $file = <File>;
close( File );
my @a = ( $file =~ /[\x{0020}-\x{007e}]{3,}/gs );
foreach ( @a ) { print "$_\n"; }

最佳答案

有人已经编写了一个实用程序来执行您想要的操作:

http://technet.microsoft.com/en-us/sysinternals/bb897439.aspx

usage: strings [-a] [-f offset] [-b bytes] [-n length] [-o] [-q] [-s] [-u] <file or directory>

Strings takes wild-card expressions for file names, and additional command line parameters are defined as follows:

-a  Ascii-only search (Unicode and Ascii is default)
-b  Bytes of file to scan
-f  File offset at which to start scanning.
-o  Print offset in file string was located
-n  Minimum string length (default is 3)
-q  Quiet (no banner)
-s  Recurse subdirectories
-u  Unicode-only search (Unicode and Ascii is default)  

To search one or more files for the presence of a particular string using strings use a command like this:

strings * | findstr /i TextToSearchFor

编辑:

如果您想在 Python 中实现它，请尝试此操作，但您必须确定要查找的 Unicode 字符范围，并将其作为 UTF-16LE 进行搜索。许多字符对看起来像有效的可打印 Unicode。我不知道 strings 使用什么算法

import re
data = open('c:/users/metolone/util/windiff.exe','rb').read()

# Search for printable ASCII characters encoded as UTF-16LE.
pat = re.compile(ur'(?:[\x20-\x7E][\x00]){3,}')
words = [w.decode('utf-16le') for w in pat.findall(data)]
for w in words:
    print w

关于python - 如何从 MZ 可执行文件中提取 Unicode 字符序列？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/10637055/

25

4

0

文章推荐： python - 在 Django 教程中使用模板

文章推荐： javascript - React 中分离的 DOM 节点内存泄漏

文章推荐： python - 是否有不存储值的类 Set 对象？

VBA 错误处理和 MZ 工具
感谢阅读 StackOverflow 上的错误处理，我发现了 Mz-Tools。但是，我想知道是否有办法同时更新 MZ-Tools 添加的所有错误处理程序。如果我使用 MZ-Tools 添加错误处理
android - 无法使用上述方法连接打印机 Zebra MZ 220
我有以下代码可以使用我的 android 2.3.7 在我的 zebra mz 220 中打印: public class Printing extends Activity { ProgressDi
assembly - DOS 可执行文件中的 "MZ" header 及其对堆栈的影响
DOS 可执行文件的前 2 个字节是 0x4d 和 0x5a。如果这些被执行，0x4d 意味着 'dec ebp' 和 0x5a 是 'pop edx'。 'dec ebp' 将基址指针减 1，'po
assembly - DOS 可执行文件中的 "MZ" header 及其对堆栈的影响
DOS 可执行文件的前 2 个字节是 0x4d 和 0x5a。如果这些被执行，0x4d 意味着 'dec ebp' 和 0x5a 是 'pop edx'。 'dec ebp' 将基址指针减 1，'po
python - 如何从 MZ 可执行文件中提取 Unicode 字符序列？
我想从二进制(“.exe”)文件中获取 Unicode 字符串。当我使用这样的代码时: `unicode_str = re.compile( u'[\u0020-\u007e]{1,}',r
java - 波兰语字母在 MZ 320 CPCL 中不起作用
我将 Times New Roman 字体下载到我的 Zebra 打印机，当我单击“示例打印输出”(抱歉翻译不当)时，它打印出波兰语字母，但是当我使用 CPCL 命令时，它不起作用。我的代码示例:
windows - DOS MZ 二进制文件可以有 VERSIONINFO 元数据吗？
我的印象是什么 VERSIONINFO 资源是纯粹的 Windows 发明，但是在阅读 documentation 时，我偶然发现了 VERSIONINFO 资源定义语句的 fileos 参数的 VO
assembly - MZ(DOS，16 位).EXE header 中页数和最后一页大小的重要性
我正在尝试学习如何使用程序集 (NASM) 创建 Dos .EXE 文件，手动构建 header 并将文件汇编为二进制文件。我的页面选项有问题(页面总数和最后一页的字节数)。无论我将初始值设置得有多小
c - 确定 MZ exe 结束位置和 LE/LX/PE 开始位置
我想知道确定 EXE 文件的 MZ 部分结束位置以及附加的扩展可执行文件开始位置(可以是 PE/LE/LX/NE/COFF 等...)的最佳方法是什么。我找到了这个网站:http://www.del
c++ - 为什么我在 wish 控制台上加载 dll 时得到 "invalid command name "MZ""？
我有一个库，我已经使用 swig 为它生成了 tcl 绑定(bind)。这样生成的dll是xyz_tcl.dll，如果我原来的lib dll是xyz.dll的话。但是当我尝试加载 dll 时，它说“无

首页

博学

6Ren·AI

商城

python - 如何从 MZ 可执行文件中提取 Unicode 字符序列？