确定数据的未知压缩方式

huangapple go评论61阅读模式
英文:

How to determine unknown compression of data

问题

你们是如何确定某些数据是如何压缩的?

我正在尝试解析一个二进制文件。我看到了其中的结构,并找到了一些数据段的位置。

UNIX的'file'命令只是说它们是数据。由Luigi Auriemma的"Signsrch signature file"没有匹配任何区块。

文件扩展名是".dz"。文件以"Dr*Z"开头,数据头以"zFED"开头。Google搜索没有找到相关信息。数据块没有其他结构,没有模式、可读的字符串等。

(有一个DZ文件格式,但它是专有的,用于压缩Quake游戏文件,时间在2000-2005年之间,我尚未能在这台Macbook上运行"dzip.exe"。)

以下是数据头的格式:

  • char4 "zFED" 或 "DEFz"(反转)
  • int32 full_size 未压缩数据的大小(小尾端)
  • int32 cmpr_size 这里的数据字节数(小尾端)
  • byte[] 数据...

数据头可能还有更多字段。

每个四个数据块的开头如下(十六进制)...

BLOCK 1

  • 000000d0 7a 46 45 44 80 dc 01 00 14 0c 01 00 ec 7d ..zFED.......l}
  • 000000e0 79 5c 54 47 b6 70 dd db b7 ef 6d 9a a5 1b 50 41 y\TG6p][7om.%.PA

BLOCK 2

  • 00010ce0 7a 46 45 44 00 1e 03 00 be 5f u_.)..zFED....>_
  • 00010cf0 01 00 ec bd 7b 7c 94 c5 d5 38 fe 3c 7b 7b 72 db ..l={|.EU8~<{{r[

BLOCK 3

  • 00026ca0 7a 46 45 44 46 81 01 00 e4 ec 00 00 h,..zFEDF...dl..
  • 00026cb0 c4 bc 07 78 23 57 76 26 0a 80 24 48 80 48 44 22 D<.x#Wv&..$H.HD"

BLOCK 4

  • 00035980 7a 46 45 44 60 f8 13 00 ..<\s6k?zFED`x..
  • 00035990 7z 9c 00 00 ec bd 7b 5f 13 d9 b6 b6 5d 15 82 78 z...l={_.Y66]..x

这些数据的直方图相对平坦,但有一个波动较大,这是我最感兴趣的。

检查直方图

  • Block 1:通常跨度0.3到0.45,峰值达到0.5%。
  • Block 2:跨度0.35到0.45,峰值达到0.3%和0.5%。
  • Block 3:跨度0.34到0.44,峰值达到0.32%和0.49%。
  • Block 4:跨度0.05到2%,峰值在64、128、160、208之前,在32、48、64、96、128、160之后有低谷。

我正在查看的文件是"Kurzweil-SP-Updater.dz",位于以下文件中:
https://kurzweil.com/wp-content/uploads/2022/08/SP7G_UpdateE1.06L1.1.2.zip

问题是:我接下来应该尝试什么?
谢谢!

英文:

How do you folks figure out how some data is compressed?

I'm trying to take apart a binary file. I see the structure in it, and have found where some data segments are.

The UNIX 'file' command just says they are data. The "Signsrch signature file" by Luigi Auriemma didn't match any of the blocks.

The file extension is ".dz". The file starts with "Dr*Z" and the data headers start with "zFED". Google searches didn't turn up any infomation on those. The data blocks have no other structure that I see - no patterns, readable strings, etc.

(There is a DZ file format, but it is proprietary, from 2000-2005, for compressing Quake game files. I haven't yet been able to run "dzip.exe" on this Macbook.)

Here is the format of the data headers:

  1. char[4] &quot;zFED&quot; or &quot;DEFz&quot; flipped
  2. int32 full_size size of uncompressed data, little-endian
  3. int32 cmpr_size number of bytes of data here, L.E.
  4. byte[] data ...

There might be more fields in the header than this.
This is how each of the four data blocks start (hex) ...

  1. EC 7D 79 5C 54 47 ...
  2. EC BD 7B 7C 94 C5 ...
  3. C4 BC 07 78 23 57 ...
  4. EC BD 7B 5F 13 D9 ...

So there could be some flags or format fields still there.

Here is the start of each data block:

  1. BLOCK 1
  2. 000000d0 7a 46 45 44 80 dc 01 00 14 0c 01 00 ec 7d ..zFED.\......l}
  3. tag--------- full_size--- cmpr_size--- [data ...]
  4. 000000e0 79 5c 54 47 b6 70 dd db b7 ef 6d 9a a5 1b 50 41 y\TG6p][7om.%.PA
  5. 000000f0 41 68 f7 85 08 a8 d1 68 dc 5a c3 24 0d 0a c4 24 Ahw..(Qh\ZC$..D$
  6. 00000100 6f 92 7c f3 32 71 66 92 4c 76 27 33 ef 7d 73 65 o.|s2qf.Lv&#39;3o}se
  7. BLOCK2
  8. 00010ce0 7a 46 45 44 00 1e 03 00 be 5f u_.)..zFED....&gt;_
  9. tag--------- full_size--- cmpr_size-
  10. 00010cf0 01 00 ec bd 7b 7c 94 c5 d5 38 fe 3c 7b 7b 72 db ..l={|.EU8~&lt;{{r[
  11. --size [data ...
  12. 00010d00 6c 76 37 77 2e 49 08 57 23 09 57 41 f0 12 08 e0 lv7w.I.W#.WAp..`
  13. 00010d10 26 84 8b 97 da 56 5a b5 b5 6a d5 b6 78 ab ae 37 &amp;...ZVZ55jU6x+.7
  14. 00010d20 b2 88 12 ad d6 2e 77 d4 56 df d6 b6 62 6d 5f 37 2..-V.wTV_V6bm_7
  15. BLOCK 3
  16. 00026ca0 7a 46 45 44 46 81 01 00 e4 ec 00 00 h,..zFEDF...dl..
  17. tag-------- full_size-- cmpr_size--
  18. 00026cb0 c4 bc 07 78 23 57 76 26 0a 80 24 48 80 48 44 22 D&lt;.x#Wv&amp;..$H.HD&quot;
  19. [data ...
  20. 00026cc0 08 80 48 24 41 30 81 39 b3 99 73 0e 60 68 66 b2 ..H$A0.93.s.`hf2
  21. 00026cd0 99 9b a9 d9 cc cd 50 e4 a8 31 54 ad 68 ef d8 e3 ..)YLMPd(1T-hoXc
  22. BLOCK 4
  23. 00035980 7a 46 45 44 60 f8 13 00 ..&lt;\s6k?zFED`x..
  24. tag-------- full_size--
  25. 00035990 7a 9c 00 00 ec bd 7b 5f 13 d9 b6 b6 5d 15 82 78 z...l={_.Y66]..x
  26. cmpr_size-- [data ...
  27. 000359a0 06 14 05 3c c6 43 e3 a1 5b 14 c4 33 4a c9 49 51 ...&lt;FCc![.D3JIIQ
  28. 000359b0 54 14 5b d1 76 b5 1d 21 2d 59 62 e2 0a a1 5b fb T.[Qv5.!-Ybb.![{

The histograms of the data are fairly flat. One fluctuates much more, however, and it's the one I'm most interested in.

确定数据的未知压缩方式
确定数据的未知压缩方式
确定数据的未知压缩方式
确定数据的未知压缩方式

Examining the histograms

  1. Block Usual Span Notes
  2. Block 1 0.3 0.45 peaks to 0.5%
  3. Block 1 0.35 0.45 peaks to 0.3% and 0.5%
  4. Block 3 0.34 0.44 peaks to 0.32% and 0.49%
  5. Block 4 0.05 2% peaks just before 64 128 160 208
  6. dips at 32 48 64 96 128 160

The file I'm looking at is "Kurzweil-SP-Updater.dz" inside this file:
https://kurzweil.com/wp-content/uploads/2022/08/SP7G_UpdateE1.06L1.1.2.zip

The question is: What should I try next?
Thank you!

答案1

得分: 1

大部分文件(99.9%)包括四个完整的压缩流:

<pre>
偏移量 222,长度 68616,解压后 121984
偏移量 68850,长度 90034,解压后 204288
偏移量 158896,长度 60632,解压后 98630
偏移量 219540,长度 40045,解压后 1308768
</pre>

英文:

The majority of the file (99.9%) consists of four complete deflate streams:

<pre>
offset 222, length 68616, decompressed 121984
offset 68850, length 90034, decompressed 204288
offset 158896, length 60632, decompressed 98630
offset 219540, length 40045, decompressed 1308768
</pre>

huangapple
  • 本文由 发表于 2023年3月21日 02:53:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/75794215.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定