When is it valid to call inflateSetDictionary() when trying to decompress raw deflate data that was compressed with a dictionary?

huangapple go评论77阅读模式
英文:

When is it valid to call inflateSetDictionary() when trying to decompress raw deflate data that was compressed with a dictionary?

问题

问题

在尝试解压缩使用压缩字典压缩的原始deflate数据时,何时调用inflateSetDictionary()是有效的?

根据zlib手册,它指出inflateSetDictionary()可以随时调用。然而,我不清楚"随时"到底意味着什么。如果我们允许在任何时候调用inflateSetDictionary(),那么我解释为在调用inflate()之后调用inflateSetDictionary()是有效的。然而,这样做会导致inflate()返回"invalid distance too far back"错误。

我的代码

我创建了一个简单的应用程序,使用原始deflate压缩字符串"hello",并使用包含字节序列"hello"的压缩字典:

#define BUF_SIZE 16384
#define WINDOW_BITS -15 // 用于原始压缩的负值。
#define MEM_LEVEL 8

const unsigned char dictionary[] = "hello";

unsigned char uncompressed[BUF_SIZE] = "hello";
unsigned char compressed[BUF_SIZE];

z_stream deflate_stream;

deflate_stream.zalloc = Z_NULL;
deflate_stream.zfree = Z_NULL;
deflate_stream.opaque = Z_NULL;

deflateInit2(&deflate_stream,
             Z_DEFAULT_COMPRESSION,
             Z_DEFLATED,
             WINDOW_BITS,
             MEM_LEVEL,
             Z_DEFAULT_STRATEGY);

deflateSetDictionary(&deflate_stream, dictionary, sizeof(dictionary));

deflate_stream.avail_in = (uInt)strlen(uncompressed) + 1;
deflate_stream.next_in = (Bytef *)uncompressed;

deflate_stream.avail_out = BUF_SIZE;
deflate_stream.next_out = (Bytef *)compressed;

deflate(&deflate_stream, Z_FINISH);

deflateEnd(&deflate_stream);

这将在compressed缓冲区中产生4字节的原始deflate数据:

uLong compressed_size = deflate_stream.total_out;
printf("Compressed size is: %lu\n", compressed_size); // 打印 Compressed size is: 4

然后,我尝试将这些数据解压缩回字符串"hello"zlib手册指出,我需要使用原始inflate来解压原始deflate数据:

unsigned char decompressed[BUF_SIZE];

z_stream inflate_stream;

inflate_stream.zalloc = Z_NULL;
inflate_stream.zfree = Z_NULL;
inflate_stream.opaque = Z_NULL;

inflateInit2(&inflate_stream, WINDOW_BITS);

inflate_stream.avail_in = (uInt)compressed_size;
inflate_stream.next_in = (Bytef *)compressed;

inflate_stream.avail_out = BUF_SIZE;
inflate_stream.next_out = (Bytef *)decompressed;

int r = inflate(&inflate_stream, Z_FINISH);

根据zlib手册,我期望inflate()应该返回Z_NEED_DICT,然后我将使用后续的inflate()调用inflateSetDictionary()

// 必须在调用inflate后立即调用,如果该调用返回Z_NEED_DICT。
if (r == Z_NEED_DICT) {
    inflateSetDictionary(&inflate_stream, dictionary, sizeof(dictionary));
    r = inflate(&inflate_stream, Z_FINISH);
}

if (r != Z_STREAM_END) {
    printf("inflate: %s\n", inflate_stream.msg);
    return 1;
}

inflateEnd(&inflate_stream);

printf("Decompressed size is: %lu\n", strlen(decompressed));
printf("Decompressed string is: %s\n", decompressed);

然而,实际发生的是inflate()不会返回Z_NEED_DICT,而是返回Z_DATA_ERROR,并且inflate_stream.msg的值被设置为"invalid distance too far back"。

即使我调整我的代码,以使inflateSetDictionary()不管inflate()的返回值如何都会被调用,后续的inflate()调用仍然会由于"invalid distance too far back"而失败,返回Z_DATA_ERROR

我的问题

到目前为止,如果我将WINDOW_BITS设置为15,而不是原始的-15,则我的代码可以正常工作,使用默认的zlib编码。

如果我将inflateSetDictionary()的调用移到inflate()之前,我的代码也可以正常工作。

然而,我不清楚为什么我的现有代码不允许inflate()返回Z_NEED_DICT,以便我可以进行后续的inflateSetDictionary()调用。

我的代码中是否存在错误,阻止inflate()返回Z_NEED_DICT?或者根据zlib手册的陈述,inflateSetDictionary()只能在原始编码之前调用inflate(),与其相矛盾?

英文:

The Problem

When is it valid to call inflateSetDictionary() when trying to decompress raw deflate data that was compressed with a compression dictionary?

According to the zlib manual, it is stated that inflateSetDictionary() can be called "at any time". However, it is unclear to me what "at any time" actually means. If we are allowed to call inflateSetDictionary() "at any time", then I interpret it as being valid to call inflateSetDictionary() after calling inflate(). However, doing so results in inflate() returning an "invalid distance too far back" error.

My Code

I created a simple application to compress the string "hello" using raw deflate, with a compression dictionary that also consists of the byte sequence "hello":

#define BUF_SIZE 16384
#define WINDOW_BITS -15 // Negative for raw.
#define MEM_LEVEL 8

const unsigned char dictionary[] = "hello";

unsigned char uncompressed[BUF_SIZE] = "hello";
unsigned char compressed[BUF_SIZE];

z_stream deflate_stream;

deflate_stream.zalloc = Z_NULL;
deflate_stream.zfree = Z_NULL;
deflate_stream.opaque = Z_NULL;

deflateInit2(&deflate_stream,
             Z_DEFAULT_COMPRESSION,
             Z_DEFLATED,
             WINDOW_BITS,
             MEM_LEVEL,
             Z_DEFAULT_STRATEGY);

deflateSetDictionary(&deflate_stream, dictionary, sizeof(dictionary));

deflate_stream.avail_in = (uInt)strlen(uncompressed) + 1;
deflate_stream.next_in = (Bytef *)uncompressed;

deflate_stream.avail_out = BUF_SIZE;
deflate_stream.next_out = (Bytef *)compressed;

deflate(&deflate_stream, Z_FINISH);

deflateEnd(&deflate_stream);

This produced 4 bytes of raw deflate data into the compressed buffer:

uLong compressed_size = deflate_stream.total_out;
printf("Compressed size is: %lu\n", compressed_size); // prints Compressed size is: 4

I then attempted to decompress this data back into the string "hello". The zlib manual states that I would need to use raw inflate to decompress raw deflate data:

unsigned char decompressed[BUF_SIZE];

z_stream inflate_stream;

inflate_stream.zalloc = Z_NULL;
inflate_stream.zfree = Z_NULL;
inflate_stream.opaque = Z_NULL;

inflateInit2(&inflate_stream, WINDOW_BITS);

inflate_stream.avail_in = (uInt)compressed_size;
inflate_stream.next_in = (Bytef *)compressed;

inflate_stream.avail_out = BUF_SIZE;
inflate_stream.next_out = (Bytef *)decompressed;

int r = inflate(&inflate_stream, Z_FINISH);

According to the zlib manual, I would expect that inflate() should return Z_NEED_DICT, and I would then call inflateSetDictionary() with a subsequent call to inflate():

// Must be called immediately after a call of inflate, if that call returned Z_NEED_DICT.
if (r == Z_NEED_DICT) {
    inflateSetDictionary(&inflate_stream, dictionary, sizeof(dictionary));
    r = inflate(&inflate_stream, Z_FINISH);
}

if (r != Z_STREAM_END) {
    printf("inflate: %s\n", inflate_stream.msg);
    return 1;
}

inflateEnd(&inflate_stream);

printf("Decompressed size is: %lu\n", strlen(decompressed));
printf("Decompressed string is: %s\n", decompressed);

However, what ends up happening is that inflate() will not return Z_NEED_DICT, and instead return Z_DATA_ERROR, with the value of inflate_stream.msg being set to "invalid distance too far back".

Even if I were to adjust my code so that inflateSetDictionary() is called regardless of the return value of inflate(), the subsequent inflate() call will still fail with Z_DATA_ERROR due to "invalid distance too far back".

My Question

So far, my code works correctly if I were to use the default zlib encoding by setting WINDOW_BITS to 15, as opposed to -15 for the raw encoding.

My code also works correctly if I were to move the call for inflateSetDictionary() before the call to inflate().

However, it's not clear to me why my existing code does not allow inflate() to return Z_NEED_DICT, so that I can make a subsequent call to inflateSetDictionary().

Is there a mistake in my code somewhere that is preventing inflate() from returning Z_NEED_DICT? Or can inflateSetDictionary() only be called prior to inflate() for the raw encoding, contrary to what the zlib manual states?

答案1

得分: 3

inflate() 只会在 zlib 流中返回 Z_NEED_DICT,其中需要字典的指示由 zlib 头部中的一个位表示,后跟用于压缩的字典的 Adler-32 以验证或选择字典。在原始 deflate 流中没有这样的指示。inflate() 无法从原始 deflate 流中知道数据是否使用字典进行了压缩。由于最初创建了原始 deflate 流,因此由 来了解解压缩所需的内容。

由于在压缩之前执行了 deflateSetDictionary(),因此在解压缩之前,在 inflateInit() 之后,在 同一位置 执行 inflateSetDictionary() 是你的责任。正如你发现的,你需要在 inflateInit() 之后插入:

inflateSetDictionary(&inflate_stream, dictionary, sizeof(dictionary));

然后解压缩将成功。

是的,你可以在原始 deflate 解压缩的任何时候执行 inflateSetDictionary()。但是,只有在进行相应的压缩时,即在相同的位置执行了 deflateSetDictionary() 时,它才会 起作用

英文:

inflate() will only return Z_NEED_DICT for a zlib stream, where the need for a dictionary is indicated by a bit in the zlib header, followed by the Adler-32 of the dictionary that was used for compression to verify or select the dictionary. There is no such indication in a raw deflate stream. There is no way for inflate() to know from a raw deflate stream whether or not the data was compressed with a dictionary. It is up to you to know what is needed for decompression, since you made the raw deflate stream in the first place.

Since you did a deflateSetDictionary() before compressing anything, it is up to you to do an inflateSetDictionary() at the same place, before you decompress, after the inflateInit(). As you have found, you need to insert:

    inflateSetDictionary(&inflate_stream, dictionary, sizeof(dictionary));

right after the inflateInit(). Then decompression will be successful.

Yes, you can do inflateSetDictionary() at any point during a raw deflate decompression. However it will only work if you are doing it at the same point at which you did the corresponding deflateSetDictionary() when compressing.

huangapple
  • 本文由 发表于 2023年2月9日 03:42:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/75390973.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定