How can I compare two files in golang?

huangapple go评论117阅读模式

How can I compare two files in golang?



  1. equals = filecmp.cmp(file_old, file_new)




With Python I can do the next:

  1. equals = filecmp.cmp(file_old, file_new)

Is there any builtin function to do that in go language? I googled it but without success.

I could use some hash function in hash/crc32 package, but that is more work that the above Python code.


得分: 13

完成@captncraig的回答后,如果你想知道两个文件是否相同,你可以使用OS包中的SameFile(fi1, fi2 FileInfo)方法。





  1. const chunkSize = 64000
  2. func deepCompare(file1, file2 string) bool {
  3. // 检查文件大小...
  4. f1, err := os.Open(file1)
  5. if err != nil {
  6. log.Fatal(err)
  7. }
  8. defer f1.Close()
  9. f2, err := os.Open(file2)
  10. if err != nil {
  11. log.Fatal(err)
  12. }
  13. defer f2.Close()
  14. for {
  15. b1 := make([]byte, chunkSize)
  16. _, err1 := f1.Read(b1)
  17. b2 := make([]byte, chunkSize)
  18. _, err2 := f2.Read(b2)
  19. if err1 != nil || err2 != nil {
  20. if err1 == io.EOF && err2 == io.EOF {
  21. return true
  22. } else if err1 == io.EOF || err2 == io.EOF {
  23. return false
  24. } else {
  25. log.Fatal(err1, err2)
  26. }
  27. }
  28. if !bytes.Equal(b1, b2) {
  29. return false
  30. }
  31. }
  32. }

To complete the @captncraig answer, if you want to know if the two files are the same, you can use the SameFile(fi1, fi2 FileInfo) method from the OS package.

> SameFile reports whether fi1 and fi2 describe the same file. For example, on Unix this means that the device and inode fields of the two underlying structures are identical;

Otherwise, if you want to check the files contents, here is a solution which checks the two files line by line avoiding the load of the entire files in memory.

First try:

EDIT: Read by bytes chunks and fail fast if the files have not the same size.

  1. const chunkSize = 64000
  2. func deepCompare(file1, file2 string) bool {
  3. // Check file size ...
  4. f1, err := os.Open(file1)
  5. if err != nil {
  6. log.Fatal(err)
  7. }
  8. defer f1.Close()
  9. f2, err := os.Open(file2)
  10. if err != nil {
  11. log.Fatal(err)
  12. }
  13. defer f2.Close()
  14. for {
  15. b1 := make([]byte, chunkSize)
  16. _, err1 := f1.Read(b1)
  17. b2 := make([]byte, chunkSize)
  18. _, err2 := f2.Read(b2)
  19. if err1 != nil || err2 != nil {
  20. if err1 == io.EOF && err2 == io.EOF {
  21. return true
  22. } else if err1 == io.EOF || err2 == io.EOF {
  23. return false
  24. } else {
  25. log.Fatal(err1, err2)
  26. }
  27. }
  28. if !bytes.Equal(b1, b2) {
  29. return false
  30. }
  31. }
  32. }


得分: 11




  1. 文件模式
  2. 修改时间
  3. 大小




I am not sure that function does what you think it does. From the docs,

> Unless shallow is given and is false, files with identical os.stat() signatures are taken to be equal.

Your call is comparing only the signature of os.stat, which only includes:

  1. File mode
  2. Modified Time
  3. Size

You can learn all three of these things in Go from the os.Stat function. This really would only indicate that they are literally the same file, or symlinks to the same file, or a copy of that file.

If you want to go deeper you can open both files and compare them (python version reads 8k at a time).

You could use an crc or md5 to hash both files, but if there are differences at the beginning of a long file, you want to stop early. I would recommend reading some number of bytes at a time from each reader and comparing with bytes.Compare.


得分: 9


  1. package main
  2. import (
  3. "fmt"
  4. "io/ioutil"
  5. "log"
  6. "bytes"
  7. )
  8. func main() {
  9. // 根据评论,最好不要将整个文件读入内存
  10. // 这只是一个简单的示例。
  11. f1, err1 := ioutil.ReadFile("lines1.txt")
  12. if err1 != nil {
  13. log.Fatal(err1)
  14. }
  15. f2, err2 := ioutil.ReadFile("lines2.txt")
  16. if err2 != nil {
  17. log.Fatal(err2)
  18. }
  19. fmt.Println(bytes.Equal(f1, f2)) // 根据评论,这样做性能更好。
  20. }

How about using bytes.Equal?

  1. package main
  2. import (
  3. "fmt"
  4. "io/ioutil"
  5. "log"
  6. "bytes"
  7. )
  8. func main() {
  9. // per comment, better to not read an entire file into memory
  10. // this is simply a trivial example.
  11. f1, err1 := ioutil.ReadFile("lines1.txt")
  12. if err1 != nil {
  13. log.Fatal(err1)
  14. }
  15. f2, err2 := ioutil.ReadFile("lines2.txt")
  16. if err2 != nil {
  17. log.Fatal(err2)
  18. }
  19. fmt.Println(bytes.Equal(f1, f2)) // Per comment, this is significantly more performant.
  20. }


得分: 1



  1. func CompareFile(path1, path2 string) (bool, error)



  1. package main
  2. import (
  3. "fmt"
  4. "os"
  5. ""
  6. )
  7. func main() {
  8. if len(os.Args) != 3 {
  9. fmt.Printf("usage: equal file1 file2\n")
  10. os.Exit(2)
  11. }
  12. file1 := os.Args[1]
  13. file2 := os.Args[2]
  14. equal, err := equalfile.CompareFile(file1, file2)
  15. if err != nil {
  16. fmt.Printf("equal: error: %v\n", err)
  17. os.Exit(3)
  18. }
  19. if equal {
  20. fmt.Println("equal: files match")
  21. os.Exit(0)
  22. }
  23. fmt.Println("equal: files differ")
  24. os.Exit(1)
  25. }



You can use a package like equalfile

Main API:

  1. func CompareFile(path1, path2 string) (bool, error)



  1. package main
  2. import (
  3. "fmt"
  4. "os"
  5. ""
  6. )
  7. func main() {
  8. if len(os.Args) != 3 {
  9. fmt.Printf("usage: equal file1 file2\n")
  10. os.Exit(2)
  11. }
  12. file1 := os.Args[1]
  13. file2 := os.Args[2]
  14. equal, err := equalfile.CompareFile(file1, file2)
  15. if err != nil {
  16. fmt.Printf("equal: error: %v\n", err)
  17. os.Exit(3)
  18. }
  19. if equal {
  20. fmt.Println("equal: files match")
  21. os.Exit(0)
  22. }
  23. fmt.Println("equal: files differ")
  24. os.Exit(1)
  25. }


得分: 1



  1. package main
  2. import (
  3. "fmt"
  4. "log"
  5. "os"
  6. ""
  7. )
  8. func main() {
  9. result, err := readercomp.FilesEqual(os.Args[1], os.Args[2])
  10. if err != nil {
  11. log.Fatal(err)
  12. }
  13. fmt.Println(result)
  14. }

After checking the existing answers I whipped up a simple package for comparing arbitrary (finite) io.Reader and files as a convenience method:


  1. package main
  2. import (
  3. "fmt"
  4. "log"
  5. "os"
  6. ""
  7. )
  8. func main() {
  9. result, err := readercomp.FilesEqual(os.Args[1], os.Args[2])
  10. if err != nil {
  11. log.Fatal(err)
  12. }
  13. fmt.Println(result)
  14. }


得分: 0

这是我写的一个io.Reader。你可以使用_, err := io.Copy(ioutil.Discard, newCompareReader(a, b))来检查两个流是否具有相同的内容。这个实现通过限制不必要的数据复制来优化性能。

  1. package main
  2. import (
  3. "bytes"
  4. "errors"
  5. "fmt"
  6. "io"
  7. )
  8. type compareReader struct {
  9. a io.Reader
  10. b io.Reader
  11. bBuf []byte // 需要一个缓冲区来将B的数据与从A读取的数据进行比较
  12. }
  13. func newCompareReader(a, b io.Reader) io.Reader {
  14. return &compareReader{
  15. a: a,
  16. b: b,
  17. }
  18. }
  19. func (c *compareReader) Read(p []byte) (int, error) {
  20. if c.bBuf == nil {
  21. // 假设p的长度保持不变,因此我们可以优化它们的缓冲区大小相等
  22. c.bBuf = make([]byte, len(p))
  23. }
  24. // 只读取我们可以适应p和bBuf的数据量
  25. readA, errA := c.a.Read(p[0:min(len(p), len(c.bBuf))])
  26. if readA > 0 {
  27. // bBuf保证至少有readA的空间
  28. if _, errB := io.ReadFull(c.b, c.bBuf[0:readA]); errB != nil { // 文档:"只有在没有读取任何字节时才会出现EOF"
  29. if errB == io.ErrUnexpectedEOF {
  30. return readA, errors.New("compareReader: A的数据比B多")
  31. } else {
  32. return readA, fmt.Errorf("compareReader: 从B读取时出错:%w", errB)
  33. }
  34. }
  35. if !bytes.Equal(p[0:readA], c.bBuf[0:readA]) {
  36. return readA, errors.New("compareReader: 字节不相等")
  37. }
  38. }
  39. if errA == io.EOF {
  40. // 在正常情况下,也期望从B获得EOF。可能是多余的调用,因为我们可能已经从上面的循环中得到了它,但在这里检查更容易
  41. readB, errB := c.b.Read(c.bBuf)
  42. if readB > 0 {
  43. return readA, errors.New("compareReader: B的数据比A多")
  44. }
  45. if errB != io.EOF {
  46. return readA, fmt.Errorf("compareReader: 从A得到EOF,但从B没有:%w", errB)
  47. }
  48. }
  49. return readA, errA
  50. }

Here's an io.Reader I whipped out. You can _, err := io.Copy(ioutil.Discard,
newCompareReader(a, b))
to get an error if two streams don't share equal contents. This implementation is optimized for performance by limiting unnecessary data copying.

  1. package main
  2. import (
  3. "bytes"
  4. "errors"
  5. "fmt"
  6. "io"
  7. )
  8. type compareReader struct {
  9. a io.Reader
  10. b io.Reader
  11. bBuf []byte // need buffer for comparing B's data with one that was read from A
  12. }
  13. func newCompareReader(a, b io.Reader) io.Reader {
  14. return &compareReader{
  15. a: a,
  16. b: b,
  17. }
  18. }
  19. func (c *compareReader) Read(p []byte) (int, error) {
  20. if c.bBuf == nil {
  21. // assuming p's len() stays the same, so we can optimize for both of their buffer
  22. // sizes to be equal
  23. c.bBuf = make([]byte, len(p))
  24. }
  25. // read only as much data as we can fit in both p and bBuf
  26. readA, errA := c.a.Read(p[0:min(len(p), len(c.bBuf))])
  27. if readA > 0 {
  28. // bBuf is guaranteed to have at least readA space
  29. if _, errB := io.ReadFull(c.b, c.bBuf[0:readA]); errB != nil { // docs: "EOF only if no bytes were read"
  30. if errB == io.ErrUnexpectedEOF {
  31. return readA, errors.New("compareReader: A had more data than B")
  32. } else {
  33. return readA, fmt.Errorf("compareReader: read error from B: %w", errB)
  34. }
  35. }
  36. if !bytes.Equal(p[0:readA], c.bBuf[0:readA]) {
  37. return readA, errors.New("compareReader: bytes not equal")
  38. }
  39. }
  40. if errA == io.EOF {
  41. // in happy case expecting EOF from B as well. might be extraneous call b/c we might've
  42. // got it already from the for loop above, but it's easier to check here
  43. readB, errB := c.b.Read(c.bBuf)
  44. if readB > 0 {
  45. return readA, errors.New("compareReader: B had more data than A")
  46. }
  47. if errB != io.EOF {
  48. return readA, fmt.Errorf("compareReader: got EOF from A but not from B: %w", errB)
  49. }
  50. }
  51. return readA, errA
  52. }


得分: 0


os.SameFile的功能与Python的filecmp.cmp(f1, f2)相似(即shallow=true),它只比较通过stat获取的文件信息。

func SameFile(fi1, fi2 FileInfo) bool




> The standard way is to stat them and use os.SameFile.
> --

os.SameFile should roughly do the same things as Python's filecmp.cmp(f1, f2) (ie. shallow=true, meaning it only compares the file infos obtained by stat).

> func SameFile(fi1, fi2 FileInfo) bool
> SameFile reports whether fi1 and fi2 describe the same file. For example, on Unix this means that the device and inode fields of the two underlying structures are identical; on other systems the decision may be based on the path names. SameFile only applies to results returned by this package's Stat. It returns false in other cases.

But if you actually want to compare the file's content, you'll have to do it yourself.


得分: 0



  1. package util
  2. import (
  3. "bytes"
  4. "io"
  5. "os"
  6. )
  7. // 判断两个文件是否具有相同的内容。
  8. // chunkSize 是要扫描的块的大小;传入0以获取一个合理的默认值。
  9. // *跟随*符号链接。
  10. //
  11. // 如果发生其他错误,可能会返回错误;在这种情况下,应忽略'same'的值。
  12. //
  13. // 源自
  14. // 根据 CC-BY-SA-4.0 许可证由多位贡献者提供
  15. func FileCmp(file1, file2 string, chunkSize int) (same bool, err error) {
  16. if chunkSize == 0 {
  17. chunkSize = 4 * 1024
  18. }
  19. // 快捷方式:检查文件元数据
  20. stat1, err := os.Stat(file1)
  21. if err != nil {
  22. return false, err
  23. }
  24. stat2, err := os.Stat(file2)
  25. if err != nil {
  26. return false, err
  27. }
  28. // 输入是否是同一个文件?
  29. if os.SameFile(stat1, stat2) {
  30. return true, nil
  31. }
  32. // 输入是否具有相同的大小?
  33. if stat1.Size() != stat2.Size() {
  34. return false, nil
  35. }
  36. // 长路径:比较内容
  37. f1, err := os.Open(file1)
  38. if err != nil {
  39. return false, err
  40. }
  41. defer f1.Close()
  42. f2, err := os.Open(file2)
  43. if err != nil {
  44. return false, err
  45. }
  46. defer f2.Close()
  47. b1 := make([]byte, chunkSize)
  48. b2 := make([]byte, chunkSize)
  49. for {
  50. n1, err1 := io.ReadFull(f1, b1)
  51. n2, err2 := io.ReadFull(f2, b2)
  52. //
  53. // > 调用者应始终在考虑错误 err 之前处理 n > 0 个字节返回。
  54. // > 这样做可以正确处理在读取一些字节后发生的 I/O 错误,
  55. // > 以及允许的 EOF 行为。
  56. if !bytes.Equal(b1[:n1], b2[:n2]) {
  57. return false, nil
  58. }
  59. if (err1 == io.EOF && err2 == io.EOF) || (err1 == io.ErrUnexpectedEOF && err2 == io.ErrUnexpectedEOF) {
  60. return true, nil
  61. }
  62. // 其他错误,如网络连接中断或错误的传输
  63. if err1 != nil {
  64. return false, err1
  65. }
  66. if err2 != nil {
  67. return false, err2
  68. }
  69. }
  70. }



This does a piece-by-piece comparison of the two files, quitting as soon as it knows the two files are different. It only needs standard library functions.

It's an improvement to this that handles the short-read problem raised by mat007 and christopher by using io.ReadFull(). It also avoids reallocating the buffers.

  1. package util
  2. import (
  3. "bytes"
  4. "io"
  5. "os"
  6. )
  7. // Decide if two files have the same contents or not.
  8. // chunkSize is the size of the blocks to scan by; pass 0 to get a sensible default.
  9. // *Follows* symlinks.
  10. //
  11. // May return an error if something else goes wrong; in this case, you should ignore the value of 'same'.
  12. //
  13. // derived from
  14. // under CC-BY-SA-4.0 by several contributors
  15. func FileCmp(file1, file2 string, chunkSize int) (same bool, err error) {
  16. if chunkSize == 0 {
  17. chunkSize = 4 * 1024
  18. }
  19. // shortcuts: check file metadata
  20. stat1, err := os.Stat(file1)
  21. if err != nil {
  22. return false, err
  23. }
  24. stat2, err := os.Stat(file2)
  25. if err != nil {
  26. return false, err
  27. }
  28. // are inputs are literally the same file?
  29. if os.SameFile(stat1, stat2) {
  30. return true, nil
  31. }
  32. // do inputs at least have the same size?
  33. if stat1.Size() != stat2.Size() {
  34. return false, nil
  35. }
  36. // long way: compare contents
  37. f1, err := os.Open(file1)
  38. if err != nil {
  39. return false, err
  40. }
  41. defer f1.Close()
  42. f2, err := os.Open(file2)
  43. if err != nil {
  44. return false, err
  45. }
  46. defer f2.Close()
  47. b1 := make([]byte, chunkSize)
  48. b2 := make([]byte, chunkSize)
  49. for {
  50. n1, err1 := io.ReadFull(f1, b1)
  51. n2, err2 := io.ReadFull(f2, b2)
  52. //
  53. // > Callers should always process the n > 0 bytes returned
  54. // > before considering the error err. Doing so correctly
  55. // > handles I/O errors that happen after reading some bytes
  56. // > and also both of the allowed EOF behaviors.
  57. if !bytes.Equal(b1[:n1], b2[:n2]) {
  58. return false, nil
  59. }
  60. if (err1 == io.EOF && err2 == io.EOF) || (err1 == io.ErrUnexpectedEOF && err2 == io.ErrUnexpectedEOF) {
  61. return true, nil
  62. }
  63. // some other error, like a dropped network connection or a bad transfer
  64. if err1 != nil {
  65. return false, err1
  66. }
  67. if err2 != nil {
  68. return false, err2
  69. }
  70. }
  71. }

It surprised me that this wasn't anywhere in the standard library.


得分: -1




这个实现假设Read()调用不会同时返回N > 0(读取了一些字节)和error != nil。这是os.File的行为方式,但不是其他Read的实现方式,比如net.TCPConn。

  1. import (
  2. "os"
  3. "bytes"
  4. "errors"
  5. )
  6. var errNotSame = errors.New("文件内容不同")
  7. func compare(p1, p2 string) error {
  8. var (
  9. buf1 [8192]byte
  10. buf2 [8192]byte
  11. )
  12. fh1, err := os.Open(p1)
  13. if err != nil {
  14. return err
  15. }
  16. defer fh1.Close()
  17. fh2, err := os.Open(p2)
  18. if err != nil {
  19. return err
  20. }
  21. defer fh2.Close()
  22. for {
  23. n1, err1 := fh1.Read(buf1[:])
  24. n2, err2 := fh2.Read(buf2[:])
  25. if err1 == io.EOF && err2 == io.EOF {
  26. // 文件相同!
  27. return nil
  28. }
  29. if err1 == io.EOF || err2 == io.EOF {
  30. return errNotSame
  31. }
  32. if err1 != nil {
  33. return err1
  34. }
  35. if err2 != nil {
  36. return err2
  37. }
  38. // n1读取不完整
  39. for n1 < n2 {
  40. more, err := fh1.Read(buf1[n1:n2])
  41. if err == io.EOF {
  42. return errNotSame
  43. }
  44. if err != nil {
  45. return err
  46. }
  47. n1 += more
  48. }
  49. // n2读取不完整
  50. for n2 < n1 {
  51. more, err := fh2.Read(buf2[n2:n1])
  52. if err == io.EOF {
  53. return errNotSame
  54. }
  55. if err != nil {
  56. return err
  57. }
  58. n2 += more
  59. }
  60. if n1 != n2 {
  61. // 不应该发生
  62. return fmt.Errorf("文件比较读取不同步: %d != %d", n1, n2)
  63. }
  64. if bytes.Compare(buf1[:n1], buf2[:n2]) != 0 {
  65. return errNotSame
  66. }
  67. }
  68. }

Something like this should do the trick, and should be memory-efficient compared to the other answers. I looked at and it seemed a bit overkill to me. Before you call compare() here, you should do two os.Stat() calls and compare file sizes for an early out fast path.

The reason to use this implementation over the other answers is because you don't want to hold the entirety of both files in memory if you don't have to. You can read an amount from A and B, compare, and then continue reading the next amount, one buffer-load from each file at a time until you are done. You just have to be careful because you may read 50 bytes from A and then 60 bytes from B because your read may have blocked for some reason.

This implemention assumes a Read() call will not return N > 0 (some bytes read) at the same time as an error != nil. This is how os.File behaves, but not how other implementations of Read may behave, such as net.TCPConn.

  1. import (
  2. &quot;os&quot;
  3. &quot;bytes&quot;
  4. &quot;errors&quot;
  5. )
  6. var errNotSame = errors.New(&quot;File contents are different&quot;)
  7. func compare(p1, p2 string) error {
  8. var (
  9. buf1 [8192]byte
  10. buf2 [8192]byte
  11. )
  12. fh1, err := os.Open(p1)
  13. if err != nil {
  14. return err
  15. }
  16. defer fh1.Close()
  17. fh2, err := os.Open(p2)
  18. if err != nil {
  19. return err
  20. }
  21. defer fh2.Close()
  22. for {
  23. n1, err1 := fh1.Read(buf1[:])
  24. n2, err2 := fh2.Read(buf2[:])
  25. if err1 == io.EOF &amp;&amp; err2 == io.EOF {
  26. // files are the same!
  27. return nil
  28. }
  29. if err1 == io.EOF || err2 == io.EOF {
  30. return errNotSame
  31. }
  32. if err1 != nil {
  33. return err1
  34. }
  35. if err2 != nil {
  36. return err2
  37. }
  38. // short read on n1
  39. for n1 &lt; n2 {
  40. more, err := fh1.Read(buf1[n1:n2])
  41. if err == io.EOF {
  42. return errNotSame
  43. }
  44. if err != nil {
  45. return err
  46. }
  47. n1 += more
  48. }
  49. // short read on n2
  50. for n2 &lt; n1 {
  51. more, err := fh2.Read(buf2[n2:n1])
  52. if err == io.EOF {
  53. return errNotSame
  54. }
  55. if err != nil {
  56. return err
  57. }
  58. n2 += more
  59. }
  60. if n1 != n2 {
  61. // should never happen
  62. return fmt.Errorf(&quot;file compare reads out of sync: %d != %d&quot;, n1, n2)
  63. }
  64. if bytes.Compare(buf1[:n1], buf2[:n2]) != 0 {
  65. return errNotSame
  66. }
  67. }
  68. }

  • 本文由 发表于 2015年4月8日 10:52:26
  • 转载请务必保留本文链接:



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
