英文:
Can I read only n bytes of a file without creating an n-sized buffer?
问题
我正在检测非常大(30+ GB)的文件是否相同。与其对整个30 GB进行哈希计算,我想先对文件的前1 MB 进行哈希计算,然后对文件的10% 处开始的1 MB 进行哈希计算,再对文件的20% 处开始的1 MB 进行哈希计算,以此类推。对于我的目的来说,检测前1000 万字节是否相同已经足够了。
在 Ruby 或 JavaScript 中,我会创建一个10 MB 的缓冲区,读取1 MB 的数据到缓冲区中,然后在文件中跳到下一个位置,再读取1 MB 的数据到缓冲区,以此类推,最后对缓冲区进行哈希计算。
在 Go 中,我有点困惑如何做到这一点,因为 Read
、ReadFull
、ReadAtLeast
等函数似乎都需要一个缓冲区作为参数,并且会一直读取数据直到填满缓冲区。所以我可以分配十一个单独的缓冲区,将其中十个填充为独立的1 MB 块,然后将它们连接到最后一个缓冲区进行哈希计算... 但这似乎非常低效和浪费资源。我确定我漏掉了什么,但是查阅文档只让我更加困惑。在 Go 中,有什么适合这个问题的解决方案吗?我能否简单地要求将 n 个字节读入一个已存在的缓冲区中?
英文:
I'm detecting whether very large (30+ GB) files are the same. Rather than hash all 30 GB, I thought I'd hash the first megabyte, then the megabyte starting at 10% into the file, then the megabyte starting at 20% into the file, and so on. Detecting whether 10 million bytes are identical is good enough for my purposes.
In Ruby or JavaScript when I'd do this, I'd just create a 10 MB buffer, read 1 MB into it, seek ahead in the file, read another 1 MB into the buffer, seek ahead, etc, then hash the buffer.
In Go, I'm a little confused about how to do this, since the Read
, ReadFull
, ReadAtLeast
etc functions all seem to take a buffer as an argument and then read until they fill it. So I could allocate eleven separate buffers, fill 10 with separate 1 MB chunks, then concatenate them into the last one to hash... but that seems super inefficient and wasteful. I'm sure I'm missing something, but scouring the docs is only confusing me further. What's a suitable solution to this problem in Go? Can I simply ask to read n bytes into a pre-existing buffer?
答案1
得分: 5
你可以对传递给Read
或ReadFull
的[]byte
缓冲区进行切片。
对切片进行"切片"操作会指向相同的底层数组,因此需要先分配完整的缓冲区,然后进行原地切片:
r.Read(buf[i : i+chunkSize])
或者
io.ReadFull(r, buf[i:i+chunkSize])
链接:https://play.golang.org/p/Uj626v-GE6
英文:
You can slice the []byte
buffer you pass to Read
, or ReadFull
.
"Slicing" a slice points to the same backing array, so allocate the full buffer, and slice it in-place:
r.Read(buf[i : i+chunkSize])
or
io.ReadFull(r, buf[i:i+chunkSize])
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论