有没有比ioutil.ReadFile更快的替代方法?

huangapple go评论104阅读模式
英文:

Is there a faster alternative to ioutil.ReadFile?

问题

我正在尝试编写一个基于MD5校验和检查文件重复的程序。
我不太确定是否有遗漏的地方,但是这个函数读取XCode安装程序应用(大约8GB)时使用了16GB的内存。

func search() {
    unique := make(map[string]string)
    files, err := ioutil.ReadDir(".")
    if err != nil {
        log.Println(err)
    }

    for _, file := range files {
        fileName := file.Name()
        fmt.Println("CHECKING:", fileName)
        fi, err := os.Stat(fileName)
        if err != nil {
            fmt.Println(err)
            continue
        }
        if fi.Mode().IsRegular() {
            data, err := ioutil.ReadFile(fileName)
            if err != nil {
                fmt.Println(err)
                continue
            }
            sum := md5.Sum(data)
            hexDigest := hex.EncodeToString(sum[:])
            if _, ok := unique[hexDigest]; ok == false {
                unique[hexDigest] = fileName
            } else {
                fmt.Println("DUPLICATE:", fileName)
            }
        }
    }
}

根据我的调试,问题出在文件读取上。
有没有更好的方法来解决这个问题?
谢谢。

英文:

I am trying to make a program for checking file duplicates based on md5 checksum.
Not really sure whether I am missing something or not, but this function reading the XCode installer app (it has like 8GB) uses 16GB of Ram

func search() {
	unique := make(map[string]string)
	files, err := ioutil.ReadDir(".")
	if err != nil {
		log.Println(err)
	}

	for _, file := range files {
		fileName := file.Name()
		fmt.Println("CHECKING:", fileName)
		fi, err := os.Stat(fileName)
		if err != nil {
			fmt.Println(err)
			continue
		}
		if fi.Mode().IsRegular() {
			data, err := ioutil.ReadFile(fileName)
			if err != nil {
				fmt.Println(err)
				continue
			}
			sum := md5.Sum(data)
			hexDigest := hex.EncodeToString(sum[:])
   			if _, ok := unique[hexDigest]; ok == false {
			 	unique[hexDigest] = fileName
			} else {
			 	fmt.Println("DUPLICATE:", fileName)
			}
		}
	}
}

As per my debugging the issue is with the file reading
Is there a better approach to do that?
thanks

答案1

得分: 6

在Golang文档中有一个示例,涵盖了你的情况。

package main

import (
	"crypto/md5"
	"fmt"
	"io"
	"log"
	"os"
)

func main() {
	f, err := os.Open("file.txt")
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()

	h := md5.New()
	if _, err := io.Copy(h, f); err != nil {
		log.Fatal(err)
	}

	fmt.Printf("%x", h.Sum(nil))
}

对于你的情况,确保在循环中关闭文件,而不是延迟关闭。或者将逻辑放入一个函数中。

英文:

There is an example in the Golang documentation, which covers your case.

package main

import (
	"crypto/md5"
	"fmt"
	"io"
	"log"
	"os"
)

func main() {
	f, err := os.Open("file.txt")
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()

	h := md5.New()
	if _, err := io.Copy(h, f); err != nil {
		log.Fatal(err)
	}

	fmt.Printf("%x", h.Sum(nil))
}

For your case, just make sure to close the files in the loop and not defer them. Or put the logic into a function.

答案2

得分: 5

听起来,你的问题不是速度本身,而是16GB的RAM。

不要使用ReadFile将整个文件读入变量中;使用io.Copy从Open提供的Reader到hash/md5提供的Writer(md5.New返回一个hash.Hash,它嵌入了一个io.Writer)。这样只会一次复制一小部分,而不是将整个文件加载到RAM中。

这是Go语言中许多地方都有用的技巧;像text/templatecompress/gzipnet/http等包都是基于Reader和Writer工作的。使用它们,你通常不需要创建大型的[]bytestring;你可以将I/O接口连接在一起,让它们为你传递内容的片段。在一个垃圾回收的语言中,节省内存通常也能节省CPU的工作量。

英文:

Sounds like the 16GB RAM is your problem, not speed per se.

Don't read the entire file into a variable with ReadFile; io.Copy from the Reader that Open gives you to the Writer that hash/md5 provides (md5.New returns a hash.Hash, which embeds an io.Writer). That only copies a little bit at a time instead of pulling all of the file into RAM.

This is a trick useful in a lot of places in Go; packages like text/template, compress/gzip, net/http, etc. work in terms of Readers and Writers. With them, you don't usually need to create huge []bytes or strings; you can hook I/O interfaces up to each other and let them pass around pieces of content for you. In a garbage collected language, saving memory tends to save you CPU work as well.

huangapple
  • 本文由 发表于 2017年7月14日 17:19:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/45099221.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定