确定数据的未知压缩方式

huangapple go评论52阅读模式
英文:

How to determine unknown compression of data

问题

你们是如何确定某些数据是如何压缩的?

我正在尝试解析一个二进制文件。我看到了其中的结构,并找到了一些数据段的位置。

UNIX的'file'命令只是说它们是数据。由Luigi Auriemma的"Signsrch signature file"没有匹配任何区块。

文件扩展名是".dz"。文件以"Dr*Z"开头,数据头以"zFED"开头。Google搜索没有找到相关信息。数据块没有其他结构,没有模式、可读的字符串等。

(有一个DZ文件格式,但它是专有的,用于压缩Quake游戏文件,时间在2000-2005年之间,我尚未能在这台Macbook上运行"dzip.exe"。)

以下是数据头的格式:

  • char4 "zFED" 或 "DEFz"(反转)
  • int32 full_size 未压缩数据的大小(小尾端)
  • int32 cmpr_size 这里的数据字节数(小尾端)
  • byte[] 数据...

数据头可能还有更多字段。

每个四个数据块的开头如下(十六进制)...

BLOCK 1

  • 000000d0 7a 46 45 44 80 dc 01 00 14 0c 01 00 ec 7d ..zFED.......l}
  • 000000e0 79 5c 54 47 b6 70 dd db b7 ef 6d 9a a5 1b 50 41 y\TG6p][7om.%.PA

BLOCK 2

  • 00010ce0 7a 46 45 44 00 1e 03 00 be 5f u_.)..zFED....>_
  • 00010cf0 01 00 ec bd 7b 7c 94 c5 d5 38 fe 3c 7b 7b 72 db ..l={|.EU8~<{{r[

BLOCK 3

  • 00026ca0 7a 46 45 44 46 81 01 00 e4 ec 00 00 h,..zFEDF...dl..
  • 00026cb0 c4 bc 07 78 23 57 76 26 0a 80 24 48 80 48 44 22 D<.x#Wv&..$H.HD"

BLOCK 4

  • 00035980 7a 46 45 44 60 f8 13 00 ..<\s6k?zFED`x..
  • 00035990 7z 9c 00 00 ec bd 7b 5f 13 d9 b6 b6 5d 15 82 78 z...l={_.Y66]..x

这些数据的直方图相对平坦,但有一个波动较大,这是我最感兴趣的。

检查直方图

  • Block 1:通常跨度0.3到0.45,峰值达到0.5%。
  • Block 2:跨度0.35到0.45,峰值达到0.3%和0.5%。
  • Block 3:跨度0.34到0.44,峰值达到0.32%和0.49%。
  • Block 4:跨度0.05到2%,峰值在64、128、160、208之前,在32、48、64、96、128、160之后有低谷。

我正在查看的文件是"Kurzweil-SP-Updater.dz",位于以下文件中:
https://kurzweil.com/wp-content/uploads/2022/08/SP7G_UpdateE1.06L1.1.2.zip

问题是:我接下来应该尝试什么?
谢谢!

英文:

How do you folks figure out how some data is compressed?

I'm trying to take apart a binary file. I see the structure in it, and have found where some data segments are.

The UNIX 'file' command just says they are data. The "Signsrch signature file" by Luigi Auriemma didn't match any of the blocks.

The file extension is ".dz". The file starts with "Dr*Z" and the data headers start with "zFED". Google searches didn't turn up any infomation on those. The data blocks have no other structure that I see - no patterns, readable strings, etc.

(There is a DZ file format, but it is proprietary, from 2000-2005, for compressing Quake game files. I haven't yet been able to run "dzip.exe" on this Macbook.)

Here is the format of the data headers:

    char[4] &quot;zFED&quot;        or &quot;DEFz&quot; flipped
    int32   full_size     size of uncompressed data, little-endian
    int32   cmpr_size     number of bytes of data here, L.E.
    byte[]  data           ...

There might be more fields in the header than this.
This is how each of the four data blocks start (hex) ...

EC 7D 79 5C 54 47 ...
EC BD 7B 7C 94 C5 ...
C4 BC 07 78 23 57 ...
EC BD 7B 5F 13 D9  ...

So there could be some flags or format fields still there.

Here is the start of each data block:

BLOCK 1
000000d0        7a 46  45 44 80 dc  01 00 14 0c  01 00 ec 7d  ..zFED.\......l}
                tag--------- full_size--- cmpr_size--- [data ...]
000000e0  79 5c 54 47  b6 70 dd db  b7 ef 6d 9a  a5 1b 50 41  y\TG6p][7om.%.PA
000000f0  41 68 f7 85  08 a8 d1 68  dc 5a c3 24  0d 0a c4 24  Ahw..(Qh\ZC$..D$
00000100  6f 92 7c f3  32 71 66 92  4c 76 27 33  ef 7d 73 65  o.|s2qf.Lv&#39;3o}se


BLOCK2
00010ce0                     7a 46  45 44 00 1e  03 00 be 5f  u_.)..zFED....&gt;_
                             tag--------- full_size--- cmpr_size-
00010cf0  01 00 ec bd  7b 7c 94 c5  d5 38 fe 3c  7b 7b 72 db  ..l={|.EU8~&lt;{{r[
         --size [data ...
00010d00  6c 76 37 77  2e 49 08 57  23 09 57 41  f0 12 08 e0  lv7w.I.W#.WAp..`
00010d10  26 84 8b 97  da 56 5a b5  b5 6a d5 b6  78 ab ae 37  &amp;...ZVZ55jU6x+.7
00010d20  b2 88 12 ad  d6 2e 77 d4  56 df d6 b6  62 6d 5f 37  2..-V.wTV_V6bm_7


BLOCK 3
00026ca0               7a 46 45 44  46 81 01 00  e4 ec 00 00  h,..zFEDF...dl..
                       tag--------  full_size--  cmpr_size--
00026cb0  c4 bc 07 78  23 57 76 26  0a 80 24 48  80 48 44 22  D&lt;.x#Wv&amp;..$H.HD&quot;
          [data ...
00026cc0  08 80 48 24  41 30 81 39  b3 99 73 0e  60 68 66 b2  ..H$A0.93.s.`hf2
00026cd0  99 9b a9 d9  cc cd 50 e4  a8 31 54 ad  68 ef d8 e3  ..)YLMPd(1T-hoXc


BLOCK 4
00035980                            7a 46 45 44  60 f8 13 00  ..&lt;\s6k?zFED`x..
                                    tag--------  full_size--
00035990  7a 9c 00 00  ec bd 7b 5f  13 d9 b6 b6  5d 15 82 78  z...l={_.Y66]..x
          cmpr_size--  [data ...
000359a0  06 14 05 3c  c6 43 e3 a1  5b 14 c4 33  4a c9 49 51  ...&lt;FCc![.D3JIIQ
000359b0  54 14 5b d1  76 b5 1d 21  2d 59 62 e2  0a a1 5b fb  T.[Qv5.!-Ybb.![{

The histograms of the data are fairly flat. One fluctuates much more, however, and it's the one I'm most interested in.

确定数据的未知压缩方式
确定数据的未知压缩方式
确定数据的未知压缩方式
确定数据的未知压缩方式

Examining the histograms

    Block     Usual Span    Notes
    Block 1   0.3   0.45    peaks to 0.5%
    Block 1   0.35  0.45    peaks to 0.3% and 0.5%
    Block 3   0.34  0.44    peaks to 0.32% and 0.49%
    Block 4   0.05  2%      peaks just before 64 128 160 208
                            dips at 32 48 64 96 128 160

The file I'm looking at is "Kurzweil-SP-Updater.dz" inside this file:
https://kurzweil.com/wp-content/uploads/2022/08/SP7G_UpdateE1.06L1.1.2.zip

The question is: What should I try next?
Thank you!

答案1

得分: 1

大部分文件(99.9%)包括四个完整的压缩流:

<pre>
偏移量 222,长度 68616,解压后 121984
偏移量 68850,长度 90034,解压后 204288
偏移量 158896,长度 60632,解压后 98630
偏移量 219540,长度 40045,解压后 1308768
</pre>

英文:

The majority of the file (99.9%) consists of four complete deflate streams:

<pre>
offset 222, length 68616, decompressed 121984
offset 68850, length 90034, decompressed 204288
offset 158896, length 60632, decompressed 98630
offset 219540, length 40045, decompressed 1308768
</pre>

huangapple
  • 本文由 发表于 2023年3月21日 02:53:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/75794215.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定