英文:
golang - image file validation
问题
我们有一个API,可以接受来自客户端的图像文件(jpeg、png和PDF)。为了避免无限制文件上传漏洞,我们希望在服务器端实现文件内容验证。在golang中是否有规范的方法或库可以实现这一点?我知道http包有DetectContentType函数,但我不确定它是否足够,因为它基于MIME嗅探,并且只检查前512个字节。
英文:
We have an API that accepts image files (jpeg, png and PDF) from the client. To avoid
Unrestricted File Upload vulnerability, we want to implement file content validation on server side. Is there canonical way or library for golang to do this? I know http package has DetectContentType function but I'm not sure if it's sufficient given it's based on MINE sniffing and only looks at first 512 bytes.
答案1
得分: 3
如果您只想接受列出的文件类型,那么可以依赖于DetectContentType
函数。这些类型的文件在开头有着众所周知的标记。实际上,大多数文件只需查看少于512字节即可识别出来。
您可以很容易地组合一个命令行工具,并测试一些文件。
以下是我随机找到的一个jpg、pdf和gif文件的前5个字节。
$ for i in {boo.jpg,AmStd_12-5062-01_final.pdf,tenor.gif}; \
do echo "\n$i"; \
head -c 5 $i | hexdump -C; \
done
boo.jpg
00000000 ff d8 ff e0 00 |.....|
00000005
AmStd_12-5062-01_final.pdf
00000000 25 50 44 46 2d |%PDF-|
00000005
tenor.gif
00000000 47 49 46 38 39 |GIF89|
00000005
请注意,Go代码中定义的jpg标记是:[]byte("\xFF\xD8\xFF")
,这是boo.jpg
文件输出的前3个字节。其他两个文件有ASCII标记,因此更容易看到。
显然,文件可以被篡改以具有与允许的文件类型之一相匹配的签名,因此可以伪装成一个“无效”的pdf、jpg或gif文件进行上传。您如何使用文件最终会影响您是否可以完全信任自动化系统。
编辑
看起来已经有其他人制作了一个用于检查文件的命令行工具。将文件名更改为标志,您就可以得到一个可以作为良好测试工具来验证正确性的工具。
https://golangcode.com/get-the-content-type-of-file/
英文:
If you want to only accept valid files types like the ones listed then you can rely on DetectContentType
. Those types have well-known markers in the beginning of their files. In fact most can be recognized by looking far fewer than 512 bytes.
It's easy to put together a CLI tool and test out some files for yourself.
And here are the first 5 bytes of a random jpg, pdf, and gif I have laying around.
$ for i in {boo.jpg,AmStd_12-5062-01_final.pdf,tenor.gif}; \
do echo "\n$i"; \
head -c 5 $i | hexdump -C; \
done
boo.jpg
00000000 ff d8 ff e0 00 |.....|
00000005
AmStd_12-5062-01_final.pdf
00000000 25 50 44 46 2d |%PDF-|
00000005
tenor.gif
00000000 47 49 46 38 39 |GIF89|
00000005
Note the jpg marker defined in the Go code is: []byte("\xFF\xD8\xFF")
which are the first 3 bytes of the output for boo.jpg
. The other two have ASCII markers so are a little easier to see.
Obviously a file could be tampered with to have a matching signature to one of the allowed file types so it could be spoofed and an "invalid" pdf, jpg or gif could be uploaded. How you use the file ultimately plays a role in whether you can trust an automated system at all.
EDIT
Looks like someone else already made a CLI tool to check a file. Change the filename to a flag and you have a tool that could act as a good testing utility to validate correctness.
https://golangcode.com/get-the-content-type-of-file/
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论