stream - 计算出字节内容-6ren

stream - 计算出字节内容

转载作者：行者123 更新时间：2023-12-01 19:54:51

26

4

我正在处理一个包含多个流的复合文件。我对如何弄清楚每个流的内容感到沮丧。我不知道这些字节是文本还是mp3或视频。例如:有没有办法了解这些字节可能是什么类型的数据？

b'\x00\x00\x00\x00\x00\x00\x00\x00\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x0bz\xcc\xc9\xc8\xc0\xc0\x00\xc2?\x82\x1e<\x0ec\xbc*8\x19\xc8i\xb3W_\x0b\x14bH\x00\xb2-\x99\x18\x18\xfe\x03\x01\x88\xcf\xc0\x01\xc4\xe1\x0c\xf9\x0cE\x0c\xd9\x0c\xc5\x0c\xa9\x0c%\x0c\x86`\xcd \x0c\x020\x1a\x00\x00\x00\xff\xff\x02\x080\x00\x96L~\x89W\x00\x00\x00\x00\x80(\\B\xefI;\x9e}p\xfe\x1a\xb2\x9b>(\x81\x86/=\xc9xH0:Pwb\xb7\xdck-\xd2F\x04\xd7co'

最佳答案

是的，有办法找出每个流的内容。除了不可靠的扩展名之外，这个星球上的每个文件都有一个签名。它可能会被删除或错误添加。

那么 signature 是什么？？

In computing, a file signature is data used to identify or verify the contents of a file. In particular, it may refer to:

File magic number: bytes within a file used to identify the format of the file; generally a short sequence of bytes (most are 2-4 bytes long) placed at the beginning of the file; see list of file signatures

File checksum or more generally the result of a hash function over the file contents: data used to verify the integrity of the file contents, generally against transmission errors or malicious attacks. The signature can be included at the end of the file or in a separate file.

我使用了magic number为了定义魔数(Magic Number)术语，我从维基百科复制了这个

In computer programming, the term magic number has multiple meanings. It could refer to one or more of the following:

Unique values with unexplained meaning or multiple occurrences which could (preferably) be replaced with named constants

A constant numerical or text value used to identify a file format or protocol; for files, see List of file signatures

Distinctive unique values that are unlikely to be mistaken for other meanings(e.g., Globally Unique Identifiers)

在第二点中，它是特定的字节序列，例如

PNG (89 50 4E 47 0D 0A 1A 0A)

或

BMP (42 4D)

那么如何知道每个文件的魔数(Magic Number)？

在这篇文章“Investigating File Signatures Using PowerShell”中，我们发现作者创建了一个很棒的 power shell 函数来获取魔数(Magic Number)，他还提到了一个工具，我从他的文章中复制了这个

PowerShell V5 brings in Format-Hex, which can provide an alternative approach to reading the file and displaying the hex and ASCII value to determine the magic number.

表格Format-Hex帮助我复制这个描述

The Format-Hex cmdlet displays a file or other input as hexadecimal values. To determine the offset of a character from the output, add the number at the leftmost of the row to the number at the top of the column for that character.

This cmdlet can help you determine the file type of a corrupted file or a file which may not have a file name extension. Run this cmdlet, and then inspect the results for file information.

这个工具对于获取文件的魔数(Magic Number)也非常有用。这是一个例子

另一个工具是online hex editor但一开始我不明白如何使用它。

现在我们得到了神奇的数字，但是如何知道数据是什么类型或者文件或流是什么？ 这是最好的问题。幸运的是，这些神奇数字有很多数据库。让我列出一些

例如，第一个数据库具有搜索功能。只需输入不带空格的魔数(Magic Number)并搜索

之后你可能会发现。是的，可能。您很可能无法直接找到有问题的文件类型。

我遇到了这个问题，并通过针对特定类型的签名测试流来解决它。就像我在流中搜索的 PNG

def GetPngStartingOffset(arr):

    #targted magic Number for png (89 50 4E 47 0D 0A 1A 0A)
    markerFound = False
    startingOffset = 0
    previousValue = 0
    arraylength = range(0, len(arr) -1) 

    for i in arraylength:
        currentValue = arr[i]
        if (currentValue == 137):   # 0x89  
            markerFound = True
            startingOffset = i
            previousValue = currentValue
            continue

        if currentValue == 80:  # 0x50
            if (markerFound and (previousValue == 137)):
                previousValue = currentValue
                continue
            markerFound = False

        elif currentValue == 78:   # 0x4E
            if (markerFound and (previousValue == 80)):
                previousValue = currentValue
                continue
            markerFound = False

        elif currentValue == 71:   # 0x47
            if (markerFound and (previousValue == 78)):
                previousValue = currentValue
                continue
            markerFound = False

        elif currentValue == 13:   # 0x0D
            if (markerFound and (previousValue == 71)):
                previousValue = currentValue
                continue
            markerFound = False

        elif currentValue == 10:   # 0x0A
            if (markerFound and (previousValue == 26)):
                return startingOffset
            if (markerFound and (previousValue == 13)):
                previousValue = currentValue
                continue
            markerFound = False

        elif currentValue == 26:   # 0x1A
            if (markerFound and (previousValue == 10)):
                previousValue = currentValue
                continue
            markerFound = False
    return 0

一旦这个函数找到了魔数(Magic Number)

我分割流并保存 png 文件

    arr = stream.read()
    a = list(arr)
    B = a[GetPngStartingOffset(a):len(a)]
    bytesString = bytes(B)
    image = Image.open(io.BytesIO(bytesString))
    image.show()

最后，这不是端到端解决方案，而是一种计算流内容的方法感谢您的阅读并感谢@Robert Columbia 的耐心

关于stream - 计算出字节内容，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53288197/

26

4

0

文章推荐： tex - Sweave 到 LaTeX "undefined control sequence"错误

文章推荐： ios - Size 类下的 AVCaptureVideoPreviewLayer 自动旋转

文章推荐： ios - 相对于 Storyboard 中图像的特定点定位元素

文章推荐： ios - 无法让 iOS MapKit Annotation 显示标题和副标题

C++ 双重释放或损坏(出)
我在使用带有 vector STL 的迭代器时遇到了这个错误。代码:- #include #include void print_vec(std::vector vec) { auto
JAVA : regarding System. 出
JAVA:两个引用“p”&&“pp”之间有区别吗？ PrintStream p = new PrintStream(System.out); p.println("lol");
git - 如何使用git从master分支中 pull 出？
我尝试从主分支中拉出，但收到错误消息: $ git --no-optional-locks -c color.branch=false -c color.diff=false -c color.sta
c - 随机双自由或腐败(出)C
我面临着一个让我抓狂的问题! 我有一个函数，这个: void load_weapons3(t_env *e, char *name, int x, t_weapon *w) { char
c++ - 双自由或腐败(出) - C++
我正在尝试使用 CUDA 中的最小值、最大值、总和和平均值实现并行归约。这是我目前的主要代码片段。 int main() { const auto count = 8; const
c++ - 双重自由或腐败(出)C++
我知道 double free 或 corruption 错误通常是对 big 3 的违规，但在这种情况下，我找不到违规发生的地方。我有一个复制构造函数、析构函数和赋值运算符，适用于任何处理指针的东西
c - 焦点和焦点入(出)事件信号之间的区别
GTK+ 中的“focus”和“focus-in(out)-event”信号有什么区别？哪个先发射？它们与键盘(TAB)和鼠标点击有什么关系。他们互相依赖吗？我问这个是因为我想在顶层窗口中跟踪当前聚
c - 双自由或腐败(出)fclose
*** glibc detected *** /home/ghoshs/workspace/Simulator/Debug/Simulator: double free or corruption (
c++ - 我怎么知道双重释放或损坏(出)错误是从哪里来的？
#include #include #include #include using namespace std; #define MAX_WEIGHT 1000000 class Set {
Git 从错误的分支中 pull 出
我在服务器上有两个分支一个叫 R2 的分支和一个叫 DEV 的分支我无意中登录了错误的服务器，进入了存储库并执行了GIT pull 源开发但是存储库在 R2 上。所以我意识到我的错误然后尝试通过做一个
java - Gremlin:从给定顶点查找所有下游(出)路径
我有一个包含循环的大约 1000 个顶点和 3000 个边的有向图。我试图从给定的顶点找到所有下游(出)路径。使用以下 Gremlin 查询时 g.V(45712).repeat(out().si
delphi - 如何使用手势识别缩放方向(进/出)并应用缩放效果？
使用 Delphi XE 2 我试图确定缩放方向以将缩放效果应用于图像(TImage)，但没有找到执行此操作的函数，并且图像的 OnGesture 事件中的 EventInfo 属性没有此信息. 我见
c - 如何克服c中的双重释放或损坏(出)中止(核心转储)
我正在尝试创建一个 Zoom_image 函数，它使用离散傅里叶变换来缩放灰度图像。如果图像大小小于或等于 4*4 但大小增加，我包含的代码可以工作。它给出“双重释放或损坏(出)中止(核心转储)”错误
c - 双重释放或损坏(出)中止(核心转储)
当我执行 popAll 函数时，出现以下错误: 双重释放或腐败(出)中止(核心转储) 我想我已经将错误来源缩小到了这个函数。 IntegerStack 是我制作的一个简单的 ADT，其中包含一个名为
iOS:在滚动进/出 View 时更改图像位置
我有网络开发背景，我正在尝试创建类似于 this technique 的东西适用于 iOS(使用 Cocoa/Obj C)。我在谷歌搜索资源时遇到了很多困难，因为 iOS 中的“视差”往往指的是 iO
php - 使用(出)Solr 进行分面搜索
我想实现一个 faceted search对于我的一个项目。我正在使用 PHP5、Mysql 和 Symfony 1.4。显然社区指向Apache Solr这似乎正是我想要完成的。问题是该网站将在不
Git:强制从特定分支 pull 出(并防止覆盖)
我知道有 questions floating around当您没有提供明确的分支名称时，关于来自特定分支的 git pull，但是我想知道即使用户确实指定了不同的分支，是否也可以强制 pull 分支
Git 推送提示非快进，即使远程已被 pull 出
我正在尝试将我的更改推送到 NAS 上的存储库。它以我无法理解的方式失败。 documentation声明默认情况下 push 仅适用于快进更新。很公平。所以我做了一个 git pull(我的 Rem
java - 一致性缓存 | ConcurrentHashMap 入，HashMap 出
我刚开始使用 Oracle 的 Coherence 缓存，我注意到这一点:如果我在缓存中放入一个 ConcurrentHashMap 对象，当我检索它时，我可以看到它被转换为一个普通的 HashMap
c++ - Sqlite3 和 pthread，双重释放或损坏(出)
我尝试创建一个连接到数据库的线程，从那里获取一些数据并打印到控制台。问题是当该线程完成时抛出异常: 双重免费或腐败(出局)中止(核心转储) 我尝试使用 sqlite3 和 pthread，但这两个并不

首页

博学

6Ren·AI

商城

stream - 计算出字节内容