How to return hash and bytes in one step in Go?

huangapple go评论97阅读模式
英文:

How to return hash and bytes in one step in Go?

问题

我正在尝试理解如何在Go语言中一次性读取文件内容、计算其哈希值并返回其字节。到目前为止,我是通过两个步骤来完成的,例如:

// 计算文件的校验和
hasher := sha256.New()
f, err := os.Open(fname)
if err != nil {
msg := fmt.Sprintf("无法打开文件 %s,%v", fname, err)
panic(msg)
}
defer f.Close()
b, err := io.Copy(hasher, f)
if err != nil {
panic(err)
}
cksum := hex.EncodeToString(hasher.Sum(nil))

// 再次读取文件以获取数据的字节数组
data, err := ioutil.ReadFile(fname)

显然,这不是最高效的方法,因为读取操作发生了两次,一次是在复制到哈希器中,另一次是在ioutil中读取文件并返回字节列表。我正在努力理解如何将这些步骤合并在一起,在一次操作中读取数据、计算任意哈希并将其与字节列表一起返回给另一层。

英文:

I'm trying to understand how I can read content of the file, calculate its hash and return its bytes in one Go. So far, I'm doing this in two steps, e.g.

// calculate file checksum
hasher := sha256.New()
f, err := os.Open(fname)
if err != nil {
    msg := fmt.Sprintf("Unable to open file %s, %v", fname, err)
    panic(msg)
}
defer f.Close()
b, err := io.Copy(hasher, f)
if err != nil {
    panic(err)
}
cksum := hex.EncodeToString(hasher.Sum(nil))

// read again (!!!) to get data as bytes array
data, err := ioutil.ReadFile(fname)

Obviously it is not the most efficient way to do this, since read happens twice, once in copy to pass to hasher and another in ioutil to read file and return list of bytes. I'm struggling to understand how I can combine these steps together and do in one go, read data once, calculate any hash and return it along with list of bytes to another layer.

答案1

得分: 9

如果你想在不在内存中创建整个文件副本的情况下读取文件,并同时计算其哈希值,你可以使用TeeReader来实现:

hasher := sha256.New()
f, err := os.Open(fname)
data := io.TeeReader(f, hasher)
// 现在可以像通常一样从data中读取,它仍然是一个流。

这里发生的是,从data(它是一个Reader,就像文件对象f一样)读取的任何字节都会同时被推送到hasher中。

然而,请注意,只有当你通过data读取整个文件后,hasher才会生成正确的哈希值。如果你需要在决定是否要读取文件之前就得到哈希值,你只能选择两次读取文件的方法(例如像你现在这样),或者始终读取文件但如果哈希检查失败则丢弃结果。

如果你确实需要两次读取文件,你当然可以将整个文件数据缓冲到内存中的字节缓冲区中。然而,操作系统通常会将刚刚读取的文件缓存到RAM中(如果可能的话),因此自己进行缓冲的两次读取解决方案与仅对文件进行两次读取相比,性能上的好处可能是微不足道的。

英文:

If you want to read a file, without creating a copy of the entire file in memory, and at the same time calculate its hash, you can do so with a TeeReader:

hasher := sha256.New()
f, err := os.Open(fname)
data := io.TeeReader(f, hasher)
// Now read from data as usual, which is still a stream.

What happens here is that any bytes that are read from data (which is a Reader just like the file object f is) will be pushed to hasheras well.

Note, however, that hasher will produce the correct hash only once you have read the entire file through data, and not until then. So if you need the hash before you decide whether or not you want to read the file, you are left with the options of either doing it in two passes (for example like you are now), or to always read the file but discard the result if the hash check failed.

If you do read the file in two passes, you could of course buffer the entire file data in a byte buffer in memory. However, the operating system will typically cache the file you just read in RAM anyway (if possible), so the performance benefit of doing a buffered two-pass solution yourself rather than just doing two passes over the file is probably negligible.

答案2

得分: 0

你可以直接将字节写入哈希器。例如:

package main

import (
    "crypto/sha256"
    "encoding/hex"
    "io/ioutil"
)

func main() {
    hasher := sha256.New()

    data, err := ioutil.ReadFile("foo.txt")
    if err != nil {
        panic(err)
    }

    hasher.Write(data)
    cksum := hex.EncodeToString(hasher.Sum(nil))

    println(cksum)
}

由于哈希接口嵌入了io.Writer,这使得你可以一次从文件中读取字节,将它们写入哈希器,然后再返回它们。

英文:

You can write bytes directly to the hasher. For example:

package main

import (
	"crypto/sha256"
	"encoding/hex"
	"io/ioutil"
)

func main() {
	hasher := sha256.New()

	data, err := ioutil.ReadFile("foo.txt")
	if err != nil {
		panic(err)
	}

	hasher.Write(data)
	cksum := hex.EncodeToString(hasher.Sum(nil))

	println(cksum)
}

As the Hash interface embeds io.Writer. This allows you to read the bytes from the file once, write them into the hasher then also return them out.

答案3

得分: -1

首先执行data, err := ioutil.ReadFile(fname)。你将得到一个字节切片。然后创建你的哈希器,并执行hasher.Write(data)

英文:

Do data, err := ioutil.ReadFile(fname) first. You'll have your slice of bytes. Then create your hasher, and do hasher.Write(data).

答案4

得分: -1

如果你计划对文件进行哈希处理,就不应该将整个文件读入内存,因为有些大文件无法放入内存中。实际上,你很少会遇到内存不足的问题,但你可以轻松地避免这种情况。Hash 接口是一个 io.Writer。通常,哈希包会有一个 New 函数返回一个 Hash 对象。这样,你可以分块读取文件,并将其连续地传递给你拥有的 Hash 对象的 Write 方法。你还可以使用 io.Copy 等方法来实现这一点:

h := sha256.New()
data := &bytes.Buffer{}
data.Write([]byte("hi there"))
data.Write([]byte("folks"))
io.Copy(h, data)
fmt.Printf("%x", h.Sum(nil))

io.Copy 内部使用一个 32KiB 的缓冲区,因此使用它最多需要约 32KiB 的内存。

英文:

If you plan on hashing files you shouldn't read the whole file into memory because... there are large files that don't fit into RAM. Yes, in practice you very rarely will run into such out-of-memory issues but you can easily prevent them. The Hash interface is an io.Writer. Usually, the Hash packages have a New function that return a Hash. This allows you to read the file in blocks and continuously feed it to the Write method of the Hash you have. You may also use methods like io.Copy to do this:

h := sha256.New()
data := &bytes.Buffer{}
data.Write([]byte("hi there"))
data.Write([]byte("folks"))
io.Copy(h, data)
fmt.Printf("%x", h.Sum(nil))

io.Copy uses a bufer of 32KiB internally so using it requires around 32KiB of memory max.

huangapple
  • 本文由 发表于 2017年1月31日 07:35:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/41947307.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定