Java | 压缩 | 使用Deflater、DeflaterOutputStream和GZIPOutputStream的区别

huangapple go评论57阅读模式
英文:

Java | Compression | Difference between using Deflater, DeflaterOutputStream and GZIPOutputStream

问题

**使用情况:**我正在使用Java和Spring Boot Rest API项目。我将每个请求的有效负载存储在cosmos DB(sql)中。由于有效负载大小较大,我希望在将其持久化到数据库之前进行压缩。

**问题陈述:**我在网上探索了不同的压缩方式 - 使用Deflater、DeflaterOutputStream和GZIPOutputStream。我尝试了实现并发现每个最终都使用Deflater()类进行压缩(请参见下面的构造函数代码)。

DeflaterOutputStream构造函数:

public DeflaterOutputStream(OutputStream out, boolean syncFlush) {
    this(out, new Deflater(), 512, syncFlush);
    usesDefaultDeflater = true;
}

GZIPOutputStream构造函数:

public GZIPOutputStream(OutputStream out, int size, boolean syncFlush)
    throws IOException
{
    super(out, new Deflater(Deflater.DEFAULT_COMPRESSION, true),
          size,
          syncFlush);
    usesDefaultDeflater = true;
    writeHeader();
    crc.reset();
}

问题:

  1. 这三者在压缩数据方面有何不同?
  2. 如何选择在我的用例中使用哪个?

我找不到何时使用这三个类的情况。

编辑于12 Jun 12:01pm GMT

我已经尝试了这三种方法,使用一些示例数据,并测量了两个维度 - 压缩所需时间和压缩后的大小,以下是我的观察:

  • DeflaterOutputStream
时间(以ns为单位):~325037(50次迭代的平均值);430140(100次);356630(1000次)
压缩前大小,byte.length:15322
压缩后大小,byte.length:3909
  • GZIPOutputStream
时间(以ns为单位):~407427(50次迭代的平均值);411844(100次);366735(1000次)
压缩前大小,byte.length:15322
压缩后大小,byte.length:3921
  • Deflater
时间(以ns为单位):~395136();100(379532);337576(1000次)
压缩前大小,byte.length:15322
压缩后大小,byte.length:3909
  1. 三种情况下所需的时间几乎相似。
  2. DeflaterOutputStream和Deflater将数据压缩到相同的大小,而GZIPOutputStream多出12个字节 - 我已经尝试了不同的数据,并且这个12字节的差异一直存在。
英文:

Use Case: I am working with Java, Spring Boot Rest API project. I am storing each request's payload in the cosmos DB (sql). Since payload is big in size, I want to apply compression before persisting into DB.

Problem Statement: I explored over web, different ways of compression - using Deflater, DeflaterOutputStream and GZIPOutputStream. I tried implementing and figured out each ultimately comes to Deflater() class for compression (see below the code for constructor).

DeflaterOutputStream constructor:

public DeflaterOutputStream(OutputStream out, boolean syncFlush) {
        this(out, new Deflater(), 512, syncFlush);
        usesDefaultDeflater = true;
    }

GZIPOutputStream constructor:

public GZIPOutputStream(OutputStream out, int size, boolean syncFlush)
        throws IOException
    {
        super(out, new Deflater(Deflater.DEFAULT_COMPRESSION, true),
              size,
              syncFlush);
        usesDefaultDeflater = true;
        writeHeader();
        crc.reset();
    }

Question:

  1. How these 3 are different when comes to about compressing the data?
  2. How to choose which to use for my use case.

I could not find when to use each of the three class.

EDIT on 12 Jun 12:01pm GMT

I have tried all 3 with some sample data, and measured along 2 dimensions - time taken to compress, and size after compression, with following observations:

  • DeflaterOutputStream
Time(in ns): ~325037(avg of 50 iterations); 430140(100); 356630(1000)
before compress size, byte.length: 15322
after compress size, byte.length: 3909
  • GZIPOutputStream
Time(in ns): ~407427(avg of 50 iterations); 411844(100); 366735(1000)
before compress size, byte.length: 15322
after compress size, byte.length: 3921
  • Deflater
Time(in ns: ~395136(); 100(379532); 337576(1000)
before compress size, byte.length: 15322
after compress size, byte.length: 3909
  1. Time taken is almost similar in all 3 cases.
  2. DeflaterOutputStream & Deflater compresses to same size, and GZIPOutputStream has 12 bytes extra - I have tried with different data, and this this 12 byte diff was consistent.

答案1

得分: 1

DeflaterOutputStream 生成原始的deflate压缩数据。 Deflater 生成带有zlib包装的deflate数据,除非指定了nowrap,在这种情况下再次生成原始的deflate流。 (在这两种情况下,很遗憾,“deflator”这个词拼错了。Java库的开发人员也拼错了“inflator”)。 GZIPOutputStream 生成带有gzip包装的deflate数据。

它们都会进行相同的压缩,只是在两种情况下添加了小的包装。 zlib包装是六个字节,gzip包装是18个字节。这些包装提供了对数据完整性的重要检查。

至于选择哪种,那取决于您。

英文:

DeflaterOutputStream produces raw deflate compressed data. Deflater produces deflate data with a zlib wrapper, unless nowrap is specified, which case again a raw deflate stream is produced. (In both cases, the word "deflator" is sadly misspelled. The Java librarians also misspelled "inflator".) GZIPOutputStream produces deflate data with a gzip wrapper.

They will all compress the same, with small wrappers added in two cases. The zlib wrapper is six bytes, and the gzip wrapper is 18 bytes. The wrappers provide an important check on data integrity.

As for what to choose, that's up to you.

huangapple
  • 本文由 发表于 2023年6月12日 03:06:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76452092.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定