2022年6月18日 04:14:34go评论95阅读模式

英文:

goland check file type for xlsx

问题

Go标准库提供了一种检查xlsx文件类型的方法，代码如下所示：

import (
	"fmt"
	"log"
	"net/http"
	"os"
)

func main() {
	f, err := os.Open("file_c.xlsx")
	if err != nil {
		log.Fatal(err.Error())
	}
	defer f.Close()
	buf := make([]byte, 512)
	_, err = f.Read(buf)
	if err != nil {
		log.Fatal(err.Error())
	}
	contentType := http.DetectContentType(buf)
	fmt.Println(contentType)
}

运行结果为：

application/zip

你可以使用这个包来检测文件类型：https://github.com/h2non/filetype

import (
	"fmt"
	"io/ioutil"
	"net/http"
	"os"

	"github.com/h2non/filetype"
)

func main() {
    buf, _ := ioutil.ReadFile("file_c.xlsx")
	kind, _ := filetype.Match(buf)
	if kind == filetype.Unknown {
		fmt.Println("unknown")
		return
	}
	fmt.Printf("file type %s. MIME %s\n", kind.Extension, kind.MIME.Value)
}

运行结果为：

file type xlsx. MIME application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

然而，当你的代码如下所示时：

// where file is of type *multipart.FileHeader
mpf, err := file.Open()
if err != nil {
	wlog.Errorf("could not open %s file", file.Filename)
} else {
	defer mpf.Close()
}
buf := make([]byte, 512)
_, err = mpf.Read(buf)
if err != nil {
	wlog.Error("failed to read file")
} else {
	kind, _ := filetype.Match(buf)
	if kind == filetype.Unknown {
		wlog.Info("unknown file type")
	} else {
		wlog.Infof("file type %s. MIME %s\n", kind.Extension, kind.MIME.Value)
	}
}

运行结果为：

file type zip. MIME application/zip

所以在使用这个外部代码包（https://github.com/h2non/filetype）时，xlsx文件的信息在中间某个地方丢失了。你有任何想法是为什么，或者我做错了什么吗？

英文:

Go standard library facility to check file type for xlsx file gives something like this

import (
	&quot;fmt&quot;
	&quot;log&quot;
	&quot;net/http&quot;
	&quot;os&quot;
)

func main() {
	f, err := os.Open(&quot;file_c.xlsx&quot;)
	if err != nil {
		log.Fatal(err.Error())
	}
	defer f.Close()
	buf := make([]byte, 512)
	_, err = f.Read(buf)
	if err != nil {
		log.Fatal(err.Error())
	}
	contentType := http.DetectContentType(buf)
	fmt.Println(contentType)
}

and that prints:

application/zip

This package -> https://github.com/h2non/filetype

import (
	&quot;fmt&quot;
	&quot;io/ioutil&quot;
	&quot;net/http&quot;
	&quot;os&quot;

	&quot;github.com/h2non/filetype&quot;
)

func main() {
    buf, _ := ioutil.ReadFile(&quot;file_c.xlsx&quot;)
	kind, _ := filetype.Match(buf)
	if kind == filetype.Unknown {
		fmt.Println(&quot;unknown&quot;)
		return
	}
	fmt.Printf(&quot;file type %s. MIME %s\n&quot;, kind.Extension, kind.MIME.Value)
}

prints:

file type xlsx. MIME application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

However when I have code like this:

// where file is of type *multipart.FileHeader
mpf, err := file.Open()
	if err != nil {
		wlog.Errorf(&quot;could not open %s file&quot;, file.Filename)
	} else {
		defer mpf.Close()
	}
buf := make([]byte, 512)
	_, err = mpf.Read(buf)
	if err != nil {
		wlog.Error(&quot;failed to read file&quot;)
	} else {
		kind, _ := filetype.Match(buf)
		if kind == filetype.Unknown {
			wlog.Info(&quot;unknown file type&quot;)
		} else {
			wlog.Infof(&quot;file type %s. MIME %s\n&quot;, kind.Extension, kind.MIME.Value)
		}
	}

prints:

file type zip. MIME application/zip

so information about xlsx file is lost somewhere in the middle even when I use this external code -> https://github.com/h2non/filetype
Do you have any idea why or what am I doing wrong?

答案1

得分: 4

这段代码之所以有效是因为它可以扫描整个文件。而这段代码之所以无效是因为它只能看到前512个字节，这对于准确识别文件是否为XLSX来说是不够的。XLSX文件实际上是一个具有特定内容模式的压缩文件，所以它默认为更通用的ZIP类型（从技术上讲也是正确的）。

你可以查看实现代码来了解它扫描多少数据来检测文件类型-最多几千字节。

英文:

This

buf, _ := ioutil.ReadFile(&quot;file_c.xlsx&quot;)
kind, _ := filetype.Match(buf)

works because it gets to scan the entire file. This

buf := make([]byte, 512)
// ...
kind, _ := filetype.Match(buf)

does not because it only gets to see the first 512 bytes, which is not enough to identify the file definitively as XLSX. An XLSX file is just a zip file with a certain pattern of contents, so it defaults to the more generic ZIP type (which is technically also correct).

You can view the implementation to see just how much data it's scanning through to detect file type - up to several kilobytes.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Goland 检查 xlsx 文件类型。

问题

答案1

如何在Golang中解析结构体（structs）并打印结构体中的项？

htmx表单+gin无法正确读取请求体

去吧，没有参数 $1。

如何在Go Mux中最小化重复代码，同时始终返回相同的响应结构？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论