Goland 检查 xlsx 文件类型。

huangapple go评论95阅读模式
英文:

goland check file type for xlsx

问题

Go标准库提供了一种检查xlsx文件类型的方法,代码如下所示:

import (
	"fmt"
	"log"
	"net/http"
	"os"
)

func main() {
	f, err := os.Open("file_c.xlsx")
	if err != nil {
		log.Fatal(err.Error())
	}
	defer f.Close()
	buf := make([]byte, 512)
	_, err = f.Read(buf)
	if err != nil {
		log.Fatal(err.Error())
	}
	contentType := http.DetectContentType(buf)
	fmt.Println(contentType)
}

运行结果为:

application/zip

你可以使用这个包来检测文件类型:https://github.com/h2non/filetype

import (
	"fmt"
	"io/ioutil"
	"net/http"
	"os"

	"github.com/h2non/filetype"
)

func main() {
    buf, _ := ioutil.ReadFile("file_c.xlsx")
	kind, _ := filetype.Match(buf)
	if kind == filetype.Unknown {
		fmt.Println("unknown")
		return
	}
	fmt.Printf("file type %s. MIME %s\n", kind.Extension, kind.MIME.Value)
}

运行结果为:

file type xlsx. MIME application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

然而,当你的代码如下所示时:

// where file is of type *multipart.FileHeader
mpf, err := file.Open()
if err != nil {
	wlog.Errorf("could not open %s file", file.Filename)
} else {
	defer mpf.Close()
}
buf := make([]byte, 512)
_, err = mpf.Read(buf)
if err != nil {
	wlog.Error("failed to read file")
} else {
	kind, _ := filetype.Match(buf)
	if kind == filetype.Unknown {
		wlog.Info("unknown file type")
	} else {
		wlog.Infof("file type %s. MIME %s\n", kind.Extension, kind.MIME.Value)
	}
}

运行结果为:

file type zip. MIME application/zip

所以在使用这个外部代码包(https://github.com/h2non/filetype)时,xlsx文件的信息在中间某个地方丢失了。你有任何想法是为什么,或者我做错了什么吗?

英文:

Go standard library facility to check file type for xlsx file gives something like this

import (
	"fmt"
	"log"
	"net/http"
	"os"
)

func main() {
	f, err := os.Open("file_c.xlsx")
	if err != nil {
		log.Fatal(err.Error())
	}
	defer f.Close()
	buf := make([]byte, 512)
	_, err = f.Read(buf)
	if err != nil {
		log.Fatal(err.Error())
	}
	contentType := http.DetectContentType(buf)
	fmt.Println(contentType)
}

and that prints:

application/zip

This package -> https://github.com/h2non/filetype

import (
	"fmt"
	"io/ioutil"
	"net/http"
	"os"

	"github.com/h2non/filetype"
)

func main() {
    buf, _ := ioutil.ReadFile("file_c.xlsx")
	kind, _ := filetype.Match(buf)
	if kind == filetype.Unknown {
		fmt.Println("unknown")
		return
	}
	fmt.Printf("file type %s. MIME %s\n", kind.Extension, kind.MIME.Value)
}

prints:

file type xlsx. MIME application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

However when I have code like this:

// where file is of type *multipart.FileHeader
mpf, err := file.Open()
	if err != nil {
		wlog.Errorf("could not open %s file", file.Filename)
	} else {
		defer mpf.Close()
	}
buf := make([]byte, 512)
	_, err = mpf.Read(buf)
	if err != nil {
		wlog.Error("failed to read file")
	} else {
		kind, _ := filetype.Match(buf)
		if kind == filetype.Unknown {
			wlog.Info("unknown file type")
		} else {
			wlog.Infof("file type %s. MIME %s\n", kind.Extension, kind.MIME.Value)
		}
	}

prints:

file type zip. MIME application/zip

so information about xlsx file is lost somewhere in the middle even when I use this external code -> https://github.com/h2non/filetype
Do you have any idea why or what am I doing wrong?

答案1

得分: 4

这段代码之所以有效是因为它可以扫描整个文件。而这段代码之所以无效是因为它只能看到前512个字节,这对于准确识别文件是否为XLSX来说是不够的。XLSX文件实际上是一个具有特定内容模式的压缩文件,所以它默认为更通用的ZIP类型(从技术上讲也是正确的)。

你可以查看实现代码来了解它扫描多少数据来检测文件类型-最多几千字节。

英文:

This

buf, _ := ioutil.ReadFile("file_c.xlsx")
kind, _ := filetype.Match(buf)

works because it gets to scan the entire file. This

buf := make([]byte, 512)
// ...
kind, _ := filetype.Match(buf)

does not because it only gets to see the first 512 bytes, which is not enough to identify the file definitively as XLSX. An XLSX file is just a zip file with a certain pattern of contents, so it defaults to the more generic ZIP type (which is technically also correct).

You can view the implementation to see just how much data it's scanning through to detect file type - up to several kilobytes.

huangapple
  • 本文由 发表于 2022年6月18日 04:14:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/72664374.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定