Gzip头部强制文件下载

huangapple go评论79阅读模式
英文:

Gzip header forces file download

问题

我正在尝试对所有响应进行gzip压缩。

在main.go中:

mux := mux.NewRouter()
mux.Use(middlewareHeaders)
mux.Use(gzipHandler)

然后我有以下中间件:

func gzipHandler(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		gz := gzip.NewWriter(w)
		defer gz.Close()
		gzr := gzipResponseWriter{Writer: gz, ResponseWriter: w}
		next.ServeHTTP(gzr, r)
	})
}

func middlewareHeaders(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		w.Header().Set("Cache-Control", "max-age=2592000") // 30 days
		w.Header().Set("Content-Encoding", "gzip")
		w.Header().Set("Strict-Transport-Security", "max-age=63072000; includeSubDomains; preload")
		w.Header().Set("Access-Control-Allow-Headers", "Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token")
		w.Header().Set("Access-Control-Allow-Methods", "POST")
		w.Header().Set("Access-Control-Allow-Origin", "origin")
		w.Header().Set("Access-Control-Allow-Credentials", "true")
		w.Header().Set("Access-Control-Expose-Headers", "AMP-Access-Control-Allow-Source-Origin")
		w.Header().Set("AMP-Access-Control-Allow-Source-Origin", os.Getenv("DOMAIN"))
		next.ServeHTTP(w, r)
	})
}

当我使用curl访问该网站时,我得到以下结果:

curl -v https://example.com
*   Trying 44.234.222.27:443...
* TCP_NODELAY set
* Connected to example.com (XX.XXX.XXX.XX) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=example.com
*  start date: Mar 16 00:00:00 2021 GMT
*  expire date: Apr 16 23:59:59 2022 GMT
*  subjectAltName: host "example.com" matched cert's "example.com"
*  issuer: C=GB; ST=Greater Manchester; L=Salford; O=Sectigo Limited; CN=Sectigo RSA Domain Validation Secure Server CA
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x55cadcebfe10)
> GET / HTTP/2
> Host: example.com
> user-agent: curl/7.68.0
> accept: */*
> 
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
< HTTP/2 200 
< date: Mon, 07 Jun 2021 20:13:19 GMT
< access-control-allow-credentials: true
< access-control-allow-headers: Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token
< access-control-allow-methods: POST
< access-control-allow-origin: origin
< access-control-expose-headers: AMP-Access-Control-Allow-Source-Origin
< amp-access-control-allow-source-origin: example.com
< cache-control: max-age=2592000
< content-encoding: gzip
< strict-transport-security: max-age=63072000; includeSubDomains; preload
< vary: Accept-Encoding
< 
Warning: Binary output can mess up your terminal. Use "--output -" to tell 
Warning: curl to output it to your terminal anyway, or consider "--output 
Warning: <FILE>" to save to a file.
* Failed writing body (0 != 3506)
* stopped the pause stream!
* Connection #0 to host example.com left intact

当启用gzip处理程序和gzip头部时,浏览器会下载一个文件。

有人能发现我的错误吗?

英文:

I am trying to gzip all responses.
In main.go

mux := mux.NewRouter()
mux.Use(middlewareHeaders)
mux.Use(gzipHandler)

Then I have the middlewares:

func gzipHandler(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		gz := gzip.NewWriter(w)
		defer gz.Close()
		gzr := gzipResponseWriter{Writer: gz, ResponseWriter: w}
		next.ServeHTTP(gzr, r)
	})
}

func middlewareHeaders(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		w.Header().Set(&quot;Cache-Control&quot;, &quot;max-age=2592000&quot;) // 30 days
		w.Header().Set(&quot;Content-Encoding&quot;, &quot;gzip&quot;)
		w.Header().Set(&quot;Strict-Transport-Security&quot;, &quot;max-age=63072000; includeSubDomains; preload&quot;)
		w.Header().Set(&quot;Access-Control-Allow-Headers&quot;, &quot;Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token&quot;)
		w.Header().Set(&quot;Access-Control-Allow-Methods&quot;, &quot;POST&quot;)
		w.Header().Set(&quot;Access-Control-Allow-Origin&quot;, &quot;origin&quot;)
		w.Header().Set(&quot;Access-Control-Allow-Credentials&quot;, &quot;true&quot;)
		w.Header().Set(&quot;Access-Control-Expose-Headers&quot;, &quot;AMP-Access-Control-Allow-Source-Origin&quot;)
		w.Header().Set(&quot;AMP-Access-Control-Allow-Source-Origin&quot;, os.Getenv(&quot;DOMAIN&quot;))
		next.ServeHTTP(w, r)
	})
}

When I curl the site I get

curl -v https://example.com
*   Trying 44.234.222.27:443...
* TCP_NODELAY set
* Connected to example.com (XX.XXX.XXX.XX) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=example.com
*  start date: Mar 16 00:00:00 2021 GMT
*  expire date: Apr 16 23:59:59 2022 GMT
*  subjectAltName: host &quot;example.com&quot; matched cert&#39;s &quot;example.com&quot;
*  issuer: C=GB; ST=Greater Manchester; L=Salford; O=Sectigo Limited; CN=Sectigo RSA Domain Validation Secure Server CA
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x55cadcebfe10)
&gt; GET / HTTP/2
&gt; Host: example.com
&gt; user-agent: curl/7.68.0
&gt; accept: */*
&gt; 
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
&lt; HTTP/2 200 
&lt; date: Mon, 07 Jun 2021 20:13:19 GMT
&lt; access-control-allow-credentials: true
&lt; access-control-allow-headers: Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token
&lt; access-control-allow-methods: POST
&lt; access-control-allow-origin: origin
&lt; access-control-expose-headers: AMP-Access-Control-Allow-Source-Origin
&lt; amp-access-control-allow-source-origin: example.com
&lt; cache-control: max-age=2592000
&lt; content-encoding: gzip
&lt; strict-transport-security: max-age=63072000; includeSubDomains; preload
&lt; vary: Accept-Encoding
&lt; 
Warning: Binary output can mess up your terminal. Use &quot;--output -&quot; to tell 
Warning: curl to output it to your terminal anyway, or consider &quot;--output 
Warning: &lt;FILE&gt;&quot; to save to a file.
* Failed writing body (0 != 3506)
* stopped the pause stream!
* Connection #0 to host example.com left intact

When enabling the gzip handler and gzip header the browser wants to download a file.

Can anyone spot my error?

答案1

得分: 3

  1. 只有在客户端请求时才应该使用gzip

即使没有请求Accept-Encoding: gzip,但你仍然对响应进行了gzip压缩。

所以curl将原样返回给你。

  1. 根据你的浏览器行为,听起来像是双重压缩。也许你已经使用了一些HTTP反向代理,它已经处理了向浏览器的压缩,但没有对后端流量进行压缩。所以你可能根本不需要在后端进行任何压缩 - 尝试使用curl --compressed来确认这一点。

  2. 你应该从响应中过滤掉Content-Length。Content-Length是压缩后的HTTP响应的最终大小,因此在压缩过程中该值会发生变化。

  3. 你不应该盲目地对所有URI应用压缩。一些处理程序已经执行了压缩(例如prometheus的/metrics),而有些URI是无意义的压缩对象(例如.png.zip.gz)。至少在将请求传递给处理程序链之前,应该从请求中删除Accept-Encoding: gzip,以避免双重压缩。

  4. 在Go中实现了透明的gzip压缩。快速搜索发现了这个代码片段(根据上述第4点进行了调整):

package main

import (
    "compress/gzip"
    "io"
    "io/ioutil"
    "net/http"
    "strings"
    "sync"
)

var gzPool = sync.Pool{
    New: func() interface{} {
        w := gzip.NewWriter(ioutil.Discard)
        return w
    },
}

type gzipResponseWriter struct {
    io.Writer
    http.ResponseWriter
}

func (w *gzipResponseWriter) WriteHeader(status int) {
    w.Header().Del("Content-Length")
    w.ResponseWriter.WriteHeader(status)
}

func (w *gzipResponseWriter) Write(b []byte) (int, error) {
    return w.Writer.Write(b)
}

func Gzip(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        if !strings.Contains(r.Header.Get("Accept-Encoding"), "gzip") {
            next.ServeHTTP(w, r)
            return
        }

        w.Header().Set("Content-Encoding", "gzip")

        gz := gzPool.Get().(*gzip.Writer)
        defer gzPool.Put(gz)

        gz.Reset(w)
        defer gz.Close()

        r.Header.Del("Accept-Encoding")
        next.ServeHTTP(&gzipResponseWriter{ResponseWriter: w, Writer: gz}, r)
    })
}

注意 - 上述代码不支持分块编码和尾部。因此仍有改进的机会。

英文:

1. You should only gzip when it's requested by the client.

Accept-Encoding: gzip is never requested, but you gzip the response anyway.

So curl gives it back to you as-is.

2. Given the behavior of your browser, it sounds like double-compression. Maybe you have some HTTP reverse proxy in place which already handles compression to the browser, but doesn't compress backend traffic. So you may not need any gzipping at the backend at all - try curl --compressed to confirm this.

3. You should filter out Content-Length from the response. Content-Length is the final size of the compressed HTTP response, so the value changes during compression.

4. You should not blindly apply compression to all URI's. Some handlers perform gzipping already (e.g. prometheus /metrics), and some are pointless to compress (e.g. .png, .zip, .gz). At the very least strip Accept-Encoding: gzip from the request before passing it down the handler chain, to avoid double-gzipping.

5. Transparent gzipping in Go has been implemented before. A quick search reveals this gist (adjusted for point #4 above):

package main

import (
    &quot;compress/gzip&quot;
    &quot;io&quot;
    &quot;io/ioutil&quot;
    &quot;net/http&quot;
    &quot;strings&quot;
    &quot;sync&quot;
)

var gzPool = sync.Pool{
    New: func() interface{} {
        w := gzip.NewWriter(ioutil.Discard)
        return w
    },
}

type gzipResponseWriter struct {
    io.Writer
    http.ResponseWriter
}

func (w *gzipResponseWriter) WriteHeader(status int) {
    w.Header().Del(&quot;Content-Length&quot;)
    w.ResponseWriter.WriteHeader(status)
}

func (w *gzipResponseWriter) Write(b []byte) (int, error) {
    return w.Writer.Write(b)
}

func Gzip(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        if !strings.Contains(r.Header.Get(&quot;Accept-Encoding&quot;), &quot;gzip&quot;) {
            next.ServeHTTP(w, r)
            return
        }

        w.Header().Set(&quot;Content-Encoding&quot;, &quot;gzip&quot;)

        gz := gzPool.Get().(*gzip.Writer)
        defer gzPool.Put(gz)

        gz.Reset(w)
        defer gz.Close()

        r.Header.Del(&quot;Accept-Encoding&quot;)
        next.ServeHTTP(&amp;gzipResponseWriter{ResponseWriter: w, Writer: gz}, r)
    })
}

Note - the above doesn't support chunked encoding and trailers. So there's still opportunity for improvement.

huangapple
  • 本文由 发表于 2021年6月8日 04:25:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/67878281.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定