英文:
golang -Comparing two txt files
问题
我想知道文件发生变化时的内容。
有一个简单的文本文件:
textOne 1,2,3,4,5,6,7,8,9,10
然后我将其更改为:
textTwo 1,2,3,4,7,7,7,8,9,10
所以'One'和'5,6'被更改为'two'和'7,7'。
我可以通过循环找到发生变化的内容,但我想知道是否有更好的方法来检查。
英文:
I want to know the contents when file changed
there is a simeple txt file
textOne 1,2,3,4,5,6,7,8,9,10
and i change it to
textTwo 1,2,3,4,7,7,7,8,9,10
so 'One' and '5,6' changed to 'two', '7,7'
I can find things which changed by loop. but I wonder if there was some better way to check
答案1
得分: 6
使用标准的第三方库而不是循环,进行一些小的优化。
我过去使用过的一个库是Google-diff-match-patch的Go版本(正如@Not_a_golfer在评论中建议的那样)。
你可以通过首先计算两个文件的sha2
哈希值来进行优化,如果它们不相同,你可以假设它们已经改变,否则(可能)它们是相同的,可以跳过差异操作。
这种优化的一个缺点是,由于鸽巢原理的存在,理论上可能会出现相同的哈希值对应不同的内容。但是,这种情况发生的概率非常小。
编辑(基于@elithrar的评论):
由于计算非常大文件的哈希值可能耗时较长,你可以按块计算sha2
(块的大小取决于具体的sha2
哈希算法)。这样可以提前退出并提高速度。
英文:
Instead of looping use standard third-party library, with small optimisation.
One of the library that I've used in past is Go port of google-diff-match-patch (same library that @Not_a_golfer suggested in the comment.).
You can optimise this by first calculating sha2
hash of two files, and if they are not the same, you can assume that they are changed, otherwise (probably) they are same, and skip the diff operation.
One drawback of this optimisation is that, because of pigeon-hole principle, it is possible theoretically to have same hash value, for different contents. But, the probability of happening that is quite small.
EDIT (based on @elithrar's comment):
Since calculation of hash for very large file, can be time consuming. You can calculate sha2
in chunks (size depends on particular hash algorithm from sha2
family). This enables you to bail out early, and improves speed.
1: https://github.com/sergi/go-diff "go-diff"
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论