如何防止上传错误的文件?

huangapple go评论88阅读模式
英文:

How to prevent malformed uploads?

问题

我有一个相当简单的用于使用Golang上传文件到Google Cloud Storage的代码。

func upload(object *storage.ObjectHandle, b []byte) error {
    w := object.NewWriter(context.Background())

    if _, err = w.Write(b); err != nil {
        return err
    }
    return w.Close()
}

我已经上传了许多文件而没有任何问题,但昨天我注意到其中一个文件损坏了。我相当确定文件在上传过程中损坏,因为我根据其内容的MD5哈希值命名文件。我认为当调用w.Close()时,Google Cloud Storage应该返回一个错误,但它没有。如何确保在传输中断/损坏时上传始终失败的最佳方法?

英文:

I have a fairly simple code for uploading files to Google Cloud Storage using Golang.

func upload(object *storage.ObjectHandle, b []byte) error {
    w := object.NewWriter(context.Background())

    if _, err = w.Write(b); err != nil {
        return err
    }
    return w.Close()
}

I have uploaded multitudes of files without any problems, but yesterday I noticed that one of the files was damaged. I'm fairly certain that the file was damaged during the upload as I name the files based on MD5 hash of its contents. I believe the Google Cloud Storage should've returned an error when calling the w.Close() but it didn't. What's the best way to make sure that the upload always fails when the transfer is interrupted/damaged?

答案1

得分: 1

在上传字节之前和之后,您可以尝试以下检查:

  • 存储字节的长度 len(b)
  • 存储字节的 sha256 哈希值

在直接从云存储中读取后,验证这两者是否相同。当然,这可能会影响性能,但它可以确保您从 GCS 中获取的是您放入其中的内容。

然而,这并不是唯一可能出现损坏的地方 - 如果客户端停止传输或向服务器传输了错误的数据,这种方法无法检测到。如果是这样,您在上传之前通过其他方式检查完整性可能是最好的选择。如果您的文件是已知类型,您还可以通过验证它是否真的是有效的 jpg 文件来检查完整性。

最好的方法是尝试重现问题,并找出损坏发生的确切位置,以验证您的假设:GCS 应该返回错误,而不是悄悄地损坏给定的数据。

英文:

You could try the following checks before and after you upload the bytes:

  • store len(b) of bytes
  • store sha256 hash of bytes

Verify that both of these are the same when reading back from the cloud storage directly afterwards. This could impact performance of course but it would ensure that you are getting out what you put into GCS.

That isn't the only place you could see corruption though - if the client stopped transmitting or transmitted bad data to your server, this wouldn't detect it. If so checking for integrity in some other way before upload might be your best bet. If your files are of a known type you could also check for integrity that way by verifying that it really is a valid jpg file for example.

It might be best by trying to reproduce and finding out exactly where the corruption occurs first to verify your assumption that GCS should have returned an error and instead silently corrupted the data given to it.

huangapple
  • 本文由 发表于 2017年8月6日 16:33:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/45530143.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定