golang -Comparing two txt files

huangapple go评论89阅读模式
英文:

golang -Comparing two txt files

问题

我想知道文件发生变化时的内容。

有一个简单的文本文件:

textOne 1,2,3,4,5,6,7,8,9,10

然后我将其更改为:

textTwo 1,2,3,4,7,7,7,8,9,10

所以'One'和'5,6'被更改为'two'和'7,7'。

我可以通过循环找到发生变化的内容,但我想知道是否有更好的方法来检查。

英文:

I want to know the contents when file changed

there is a simeple txt file

textOne 1,2,3,4,5,6,7,8,9,10

and i change it to

textTwo 1,2,3,4,7,7,7,8,9,10

so 'One' and '5,6' changed to 'two', '7,7'

I can find things which changed by loop. but I wonder if there was some better way to check

答案1

得分: 6

使用标准的第三方库而不是循环,进行一些小的优化。

我过去使用过的一个库是Google-diff-match-patch的Go版本(正如@Not_a_golfer在评论中建议的那样)。

你可以通过首先计算两个文件的sha2哈希值来进行优化,如果它们不相同,你可以假设它们已经改变,否则(可能)它们是相同的,可以跳过差异操作。

这种优化的一个缺点是,由于鸽巢原理的存在,理论上可能会出现相同的哈希值对应不同的内容。但是,这种情况发生的概率非常小。

编辑(基于@elithrar的评论):
由于计算非常大文件的哈希值可能耗时较长,你可以按块计算sha2(块的大小取决于具体的sha2哈希算法)。这样可以提前退出并提高速度。

英文:

Instead of looping use standard third-party library, with small optimisation.

One of the library that I've used in past is Go port of google-diff-match-patch (same library that @Not_a_golfer suggested in the comment.).

You can optimise this by first calculating sha2 hash of two files, and if they are not the same, you can assume that they are changed, otherwise (probably) they are same, and skip the diff operation.

One drawback of this optimisation is that, because of pigeon-hole principle, it is possible theoretically to have same hash value, for different contents. But, the probability of happening that is quite small.

EDIT (based on @elithrar's comment):
Since calculation of hash for very large file, can be time consuming. You can calculate sha2 in chunks (size depends on particular hash algorithm from sha2 family). This enables you to bail out early, and improves speed.

1: https://github.com/sergi/go-diff "go-diff"

huangapple
  • 本文由 发表于 2015年9月6日 16:40:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/32421649.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定