gpt4 book ai didi

stream - 计算出字节内容

转载 作者:行者123 更新时间:2023-12-01 19:54:51 26 4
gpt4 key购买 nike

我正在处理一个包含多个流的复合文件。我对如何弄清楚每个流的内容感到沮丧。我不知道这些字节是文本还是mp3或视频。例如:有没有办法了解这些字节可能是什么类型的数据?

b'\x00\x00\x00\x00\x00\x00\x00\x00\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x0bz\xcc\xc9\xc8\xc0\xc0\x00\xc2?\x82\x1e<\x0ec\xbc*8\x19\xc8i\xb3W_\x0b\x14bH\x00\xb2-\x99\x18\x18\xfe\x03\x01\x88\xcf\xc0\x01\xc4\xe1\x0c\xf9\x0cE\x0c\xd9\x0c\xc5\x0c\xa9\x0c%\x0c\x86`\xcd \x0c\x020\x1a\x00\x00\x00\xff\xff\x02\x080\x00\x96L~\x89W\x00\x00\x00\x00\x80(\\B\xefI;\x9e}p\xfe\x1a\xb2\x9b>(\x81\x86/=\xc9xH0:Pwb\xb7\xdck-\xd2F\x04\xd7co'

最佳答案

是的,有办法找出每个流的内容。除了不可靠的扩展名之外,这个星球上的每个文件都有一个签名。它可能会被删除或错误添加。

那么 signature 是什么? ?

In computing, a file signature is data used to identify or verify the contents of a file. In particular, it may refer to:

  • File magic number: bytes within a file used to identify the format of the file; generally a short sequence of bytes (most are 2-4 bytes long) placed at the beginning of the file; see list of file signatures

  • File checksum or more generally the result of a hash function over the file contents: data used to verify the integrity of the file contents, generally against transmission errors or malicious attacks. The signature can be included at the end of the file or in a separate file.

我使用了magic number为了定义魔数(Magic Number)术语,我从维基百科复制了这个

In computer programming, the term magic number has multiple meanings. It could refer to one or more of the following:

  • Unique values with unexplained meaning or multiple occurrences which could (preferably) be replaced with named constants
  • A constant numerical or text value used to identify a file format or protocol; for files, see List of file signatures
  • Distinctive unique values that are unlikely to be mistaken for other meanings(e.g., Globally Unique Identifiers)

在第二点中,它是特定的字节序列,例如

PNG (89 50 4E 47 0D 0A 1A 0A) 

BMP (42 4D)

那么如何知道每个文件的魔数(Magic Number)?

在这篇文章“Investigating File Signatures Using PowerShell”中,我们发现作者创建了一个很棒的 power shell 函数来获取魔数(Magic Number),他还提到了一个工具,我从他的文章中复制了这个

PowerShell V5 brings in Format-Hex, which can provide an alternative approach to reading the file and displaying the hex and ASCII value to determine the magic number.

表格Format-Hex帮助我复制这个描述

The Format-Hex cmdlet displays a file or other input as hexadecimal values. To determine the offset of a character from the output, add the number at the leftmost of the row to the number at the top of the column for that character.

This cmdlet can help you determine the file type of a corrupted file or a file which may not have a file name extension. Run this cmdlet, and then inspect the results for file information.

这个工具对于获取文件的魔数(Magic Number)也非常有用。这是一个例子 enter image description here

另一个工具是online hex editor但一开始我不明白如何使用它。

现在我们得到了神奇的数字,但是如何知道数据是什么类型或者文件或流是什么? 这是最好的问题。幸运的是,这些神奇数字有很多数据库。让我列出一些

  1. File Signatures
  2. FILE SIGNATURES TABLE
  3. List of file signatures

例如,第一个数据库具有搜索功能。只需输入不带空格的魔数(Magic Number)并搜索

enter image description here

之后你可能会发现。是的,可能。您很可能无法直接找到有问题的文件类型。

我遇到了这个问题,并通过针对特定类型的签名测试流来解决它。就像我在流中搜索的 PNG

def GetPngStartingOffset(arr):

#targted magic Number for png (89 50 4E 47 0D 0A 1A 0A)
markerFound = False
startingOffset = 0
previousValue = 0
arraylength = range(0, len(arr) -1)

for i in arraylength:
currentValue = arr[i]
if (currentValue == 137): # 0x89
markerFound = True
startingOffset = i
previousValue = currentValue
continue

if currentValue == 80: # 0x50
if (markerFound and (previousValue == 137)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 78: # 0x4E
if (markerFound and (previousValue == 80)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 71: # 0x47
if (markerFound and (previousValue == 78)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 13: # 0x0D
if (markerFound and (previousValue == 71)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 10: # 0x0A
if (markerFound and (previousValue == 26)):
return startingOffset
if (markerFound and (previousValue == 13)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 26: # 0x1A
if (markerFound and (previousValue == 10)):
previousValue = currentValue
continue
markerFound = False
return 0

一旦这个函数找到了魔数(Magic Number) enter image description here

我分割流并保存 png 文件

    arr = stream.read()
a = list(arr)
B = a[GetPngStartingOffset(a):len(a)]
bytesString = bytes(B)
image = Image.open(io.BytesIO(bytesString))
image.show()

最后,这不是端到端解决方案,而是一种计算流内容的方法感谢您的阅读并感谢@Robert Columbia 的耐心

关于stream - 计算出字节内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53288197/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com