如何获取文件自上次修改以来添加的内容。

huangapple go评论71阅读模式
英文:

how to get added content of a file since last modification

问题

我正在使用golang开发一个项目,需要对最近添加的文件内容进行索引(使用名为bleve的框架),我正在寻找一种解决方案,以获取自上次修改以来文件的内容。我目前的解决方法是记录每个文件的最后索引位置,在后续的索引过程中,我只检索从上次记录位置开始的文件内容。

所以我想知道是否有任何库或内置功能可以实现这个?(不需要限制在go语言,任何语言都可以)

如果有比我的解决方法更好的想法,我将非常感激!

谢谢

英文:

I'm working on a project in golang that needs to index recently added file content (using framework called bleve), and I'm looking for a solution to get content of a file since last modification. My current work-around is to record the last indexed position of each file, and during indexing process later on I only retrieve file content starting from the previous recorded position.

So I wonder if there's any library or built-in functionality for this? (doesn't need to be restricted to go, any language could work)

I'll really appreciate it if anyone has a better idea than my work-around as well!

Thanks

答案1

得分: 0

如果你在类Unix系统上运行,你可以使用tail命令。如果指定跟踪文件,进程将在到达文件末尾后一直等待。你可以在程序中使用os/exec调用它,并将标准输出(pipe Stdout)传递给你的程序。然后你的程序可以定期或阻塞地从中读取。

我能想到的在Go中本地实现这个功能的方法就是你所描述的方式。还有一个Go库尝试在这里模拟tail的功能:https://github.com/hpcloud/tail

英文:

If you're running in a Unix-like system, you could just use tail. If you specify to follow the file, the process will keep waiting after reaching end of file. You can invoke this in your program with os/exec and pipe the Stdout to your program. Your program can then read from it periodically or with blocking.

The only way I can think of to do this natively in Go is like how you described. There's also a library that tries to emulate tail in Go here: https://github.com/hpcloud/tail

答案2

得分: 0

这取决于文件的变化方式。

如果文件只能追加内容,那么你只需要记录停止索引的最后偏移量,并从那里开始。

如果文件的变化可以发生在任何位置,并且变化主要是用新的字节替换旧的字节(比如改变图像的像素),那么也许你可以考虑为小块计算校验和,只索引那些具有不同校验和的块。

你可以查看Go标准库中的crypto包来计算哈希值。

如果文件的变化是对文本文件进行行插入/删除(比如对源代码的更改),那么也许一个差异算法可以帮助你找到差异。类似于https://github.com/octavore/delta这样的工具。

英文:

It depends on how the files change.

If the files are append-only, then you only need to record the last offset where you stopped indexing, and start from there.

If the changes can happen anywhere, and the changes are mostly replacing old bytes with new bytes (like changing pixels of an image), then perhaps you can consider computing checksum for small chucks, and only index those chunks that has different checksums.

You can check out crypto package in Go standard library for computing hashes.

If the changes are line insertion/deletion to text files (like changes to source code), then maybe a diff algorithm can help you find the differences. Something like https://github.com/octavore/delta.

huangapple
  • 本文由 发表于 2017年3月21日 11:34:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/42917966.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定