2015年4月24日 11:24:30go评论95阅读模式

英文:

How to detect additional mime type in Golang

问题

在net/http包中有一个http.DetectContentType([]byte)函数。但是只支持有限数量的类型。如何通过内容而不是扩展名来添加对docx、doc、xls、xlsx、ppt、pps、odt、ods、odp文件的支持呢？

据我所知，这会遇到一些问题，因为docx/xlsx/pptx/odp/odt文件与zip文件具有相同的签名（50 4B 03 04）。

英文:

There are http.DetectContentType([]byte) function in net/http package. But only limited number of types are supported. How to add support of docx, doc, xls, xlsx, ppt, pps, odt, ods, odp files not by extension, but by the content.
As far as I know, there are some problems, because docx/xlsx/pptx/odp/odt files has the same signature as the zip file (50 4B 03 04).

答案1

得分: 7

免责声明：我是mimetype的作者。

对于在3年后遇到相同问题的任何人，现在基于内容的MIME类型检测的包如下：

filetype
- 纯Go语言编写，无需C绑定
- 可以扩展以检测新的MIME类型
- 对于同时匹配多个MIME类型的文件（例如，xlsx和docx被识别为zip），存在问题，因为它将匹配函数存储在映射中，因此无法保证遍历的顺序
- 检测到的MIME类型数量有限
magicmime
- 需要安装libmagic-dev
- 在这3个包中，检测到的MIME类型数量最多
- 可以扩展，但较为困难...请参考man magic
- libmagic不是线程安全的
mimetype
- 纯Go语言编写，无需C绑定
- 检测到的MIME类型数量比filetype更多
- 线程安全
- 可以扩展

英文:

Disclaimer: I'm the author of mimetype.

For anyone having the same problem 3 years later, nowadays the packages for mime type detection based on the content are the following:

filetype
- pure go, no c bindings
- can be extented to detect new mime types
- has issues with files which pass as more than one mime type (ex: xlsx and docx passing as zip) because it stores matching functions in a map, thus it does not guarantee the order of traversal
- limited number of detected mime types
magicmime
- needs libmagic-dev installed
- of the 3, it has highest number of detected mime types
- can be extended, albeit harder... man magic
- libmagic is not thread safe
mimetype
- pure go, no c bindings
- higher number of detected mime types than filetype
- is thread safe
- can be extended

答案2

得分: 2

对于以x结尾的文件相对容易检测。只需解压缩并读取.rels/_rels文件。它包含文档中主文件的路径。它由命名空间http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument表示。只需检查其名称。对于docx，它是document.xml，对于xlsx，它是workbook.xml，对于pptx，它是presentation.xml。

更多信息可以在这里找到ECMA-376。

二进制格式更难检测。基本上，您需要读取MS-CFB文件系统并检查条目：

对于doc，是WordDocument
对于xls，是Workbook或Book
对于ppt，是PowerPoint Document
如果是加密文件，则是EncryptedPackage。

英文:

For files with x at the end are relatively easy to detect. Just unzip it and read .rels/_rels file. It contains path to the main file in document. It denoted by namespace http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument. Just check its name. It's document.xml for docx, workbook.xml for xlsx and presentation.xml for pptx.

More info here can be found here ECMA-376.

Binary formats harder to detect. Basically you need to read MS-CFB filesystem and check for entries:

WordDocument for doc
Workbook or Book for xls
PowerPoint Document for ppt
EncryptedPackage means file is encrypted.

答案3

得分: 1

目前无法扩展http.DetectContentType，因为它使用了一个固定的、未导出的"sniffers"切片：https://golang.org/src/net/http/sniff.go（写作时的第49行的sniffSignatures）。

此外，我快速浏览了godoc.org，寻找更好的包，但没有找到一个既可扩展又以内容为导向的包，符合您的要求。

我的建议是：根据Go的内容嗅探实现（遵循https://mimesniff.spec.whatwg.org/），构建您自己的包。

编辑：如果您愿意使用CGO，并且您在nix上，您可以使用类似https://github.com/jteeuwen/magic的libmagic绑定。

英文:

There's currently no way to extend http.DetectContentType as it uses a fixed, unexported slice of "sniffers": https://golang.org/src/net/http/sniff.go (sniffSignatures on line 49 at the time of writing).

Also, I looked quickly through godoc.org in search of a better package but didn't find any that is extensible and content-oriented as you require.

My advice would be: build your own package, guided by Go's content sniffer implementation (which follows https://mimesniff.spec.whatwg.org/).

Edit: If you're willing to use CGO and you're on nix, you could use libmagic bindings like for example https://github.com/jteeuwen/magic.

答案4

得分: 1

我发现了mimemagic，我觉得它比magicmime更好，因为它不使用cgo。但是magicmime在区分application/zip和office文件类型方面更好。

英文:

I found mimemagic, which I find preferable to magicmime since it doesn't use cgo. But magicmime is better at differentiating between application/zip and office file types.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Golang中检测附加的MIME类型

问题

答案1

答案2

答案3

答案4

Golang levelDB 结构体

How to correctly implement a goroutine in terminal application

将数组插入到PostgreSQL数据库中

Go – 实现超时的最佳方式

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论