英文:
goland check file type for xlsx
问题
Go标准库提供了一种检查xlsx文件类型的方法,代码如下所示:
import (
"fmt"
"log"
"net/http"
"os"
)
func main() {
f, err := os.Open("file_c.xlsx")
if err != nil {
log.Fatal(err.Error())
}
defer f.Close()
buf := make([]byte, 512)
_, err = f.Read(buf)
if err != nil {
log.Fatal(err.Error())
}
contentType := http.DetectContentType(buf)
fmt.Println(contentType)
}
运行结果为:
application/zip
你可以使用这个包来检测文件类型:https://github.com/h2non/filetype
import (
"fmt"
"io/ioutil"
"net/http"
"os"
"github.com/h2non/filetype"
)
func main() {
buf, _ := ioutil.ReadFile("file_c.xlsx")
kind, _ := filetype.Match(buf)
if kind == filetype.Unknown {
fmt.Println("unknown")
return
}
fmt.Printf("file type %s. MIME %s\n", kind.Extension, kind.MIME.Value)
}
运行结果为:
file type xlsx. MIME application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
然而,当你的代码如下所示时:
// where file is of type *multipart.FileHeader
mpf, err := file.Open()
if err != nil {
wlog.Errorf("could not open %s file", file.Filename)
} else {
defer mpf.Close()
}
buf := make([]byte, 512)
_, err = mpf.Read(buf)
if err != nil {
wlog.Error("failed to read file")
} else {
kind, _ := filetype.Match(buf)
if kind == filetype.Unknown {
wlog.Info("unknown file type")
} else {
wlog.Infof("file type %s. MIME %s\n", kind.Extension, kind.MIME.Value)
}
}
运行结果为:
file type zip. MIME application/zip
所以在使用这个外部代码包(https://github.com/h2non/filetype)时,xlsx文件的信息在中间某个地方丢失了。你有任何想法是为什么,或者我做错了什么吗?
英文:
Go standard library facility to check file type for xlsx file gives something like this
import (
"fmt"
"log"
"net/http"
"os"
)
func main() {
f, err := os.Open("file_c.xlsx")
if err != nil {
log.Fatal(err.Error())
}
defer f.Close()
buf := make([]byte, 512)
_, err = f.Read(buf)
if err != nil {
log.Fatal(err.Error())
}
contentType := http.DetectContentType(buf)
fmt.Println(contentType)
}
and that prints:
application/zip
This package -> https://github.com/h2non/filetype
import (
"fmt"
"io/ioutil"
"net/http"
"os"
"github.com/h2non/filetype"
)
func main() {
buf, _ := ioutil.ReadFile("file_c.xlsx")
kind, _ := filetype.Match(buf)
if kind == filetype.Unknown {
fmt.Println("unknown")
return
}
fmt.Printf("file type %s. MIME %s\n", kind.Extension, kind.MIME.Value)
}
prints:
file type xlsx. MIME application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
However when I have code like this:
// where file is of type *multipart.FileHeader
mpf, err := file.Open()
if err != nil {
wlog.Errorf("could not open %s file", file.Filename)
} else {
defer mpf.Close()
}
buf := make([]byte, 512)
_, err = mpf.Read(buf)
if err != nil {
wlog.Error("failed to read file")
} else {
kind, _ := filetype.Match(buf)
if kind == filetype.Unknown {
wlog.Info("unknown file type")
} else {
wlog.Infof("file type %s. MIME %s\n", kind.Extension, kind.MIME.Value)
}
}
prints:
file type zip. MIME application/zip
so information about xlsx file is lost somewhere in the middle even when I use this external code -> https://github.com/h2non/filetype
Do you have any idea why or what am I doing wrong?
答案1
得分: 4
这段代码之所以有效是因为它可以扫描整个文件。而这段代码之所以无效是因为它只能看到前512个字节,这对于准确识别文件是否为XLSX来说是不够的。XLSX文件实际上是一个具有特定内容模式的压缩文件,所以它默认为更通用的ZIP类型(从技术上讲也是正确的)。
你可以查看实现代码来了解它扫描多少数据来检测文件类型-最多几千字节。
英文:
This
buf, _ := ioutil.ReadFile("file_c.xlsx")
kind, _ := filetype.Match(buf)
works because it gets to scan the entire file. This
buf := make([]byte, 512)
// ...
kind, _ := filetype.Match(buf)
does not because it only gets to see the first 512 bytes, which is not enough to identify the file definitively as XLSX. An XLSX file is just a zip file with a certain pattern of contents, so it defaults to the more generic ZIP type (which is technically also correct).
You can view the implementation to see just how much data it's scanning through to detect file type - up to several kilobytes.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论