英文:
how to get added content of a file since last modification
问题
我正在使用golang开发一个项目,需要对最近添加的文件内容进行索引(使用名为bleve的框架),我正在寻找一种解决方案,以获取自上次修改以来文件的内容。我目前的解决方法是记录每个文件的最后索引位置,在后续的索引过程中,我只检索从上次记录位置开始的文件内容。
所以我想知道是否有任何库或内置功能可以实现这个?(不需要限制在go语言,任何语言都可以)
如果有比我的解决方法更好的想法,我将非常感激!
谢谢
英文:
I'm working on a project in golang that needs to index recently added file content (using framework called bleve), and I'm looking for a solution to get content of a file since last modification. My current work-around is to record the last indexed position of each file, and during indexing process later on I only retrieve file content starting from the previous recorded position.
So I wonder if there's any library or built-in functionality for this? (doesn't need to be restricted to go, any language could work)
I'll really appreciate it if anyone has a better idea than my work-around as well!
Thanks
答案1
得分: 0
如果你在类Unix系统上运行,你可以使用tail
命令。如果指定跟踪文件,进程将在到达文件末尾后一直等待。你可以在程序中使用os/exec
调用它,并将标准输出(pipe Stdout)传递给你的程序。然后你的程序可以定期或阻塞地从中读取。
我能想到的在Go中本地实现这个功能的方法就是你所描述的方式。还有一个Go库尝试在这里模拟tail
的功能:https://github.com/hpcloud/tail
英文:
If you're running in a Unix-like system, you could just use tail
. If you specify to follow the file, the process will keep waiting after reaching end of file. You can invoke this in your program with os/exec
and pipe the Stdout to your program. Your program can then read from it periodically or with blocking.
The only way I can think of to do this natively in Go is like how you described. There's also a library that tries to emulate tail
in Go here: https://github.com/hpcloud/tail
答案2
得分: 0
这取决于文件的变化方式。
如果文件只能追加内容,那么你只需要记录停止索引的最后偏移量,并从那里开始。
如果文件的变化可以发生在任何位置,并且变化主要是用新的字节替换旧的字节(比如改变图像的像素),那么也许你可以考虑为小块计算校验和,只索引那些具有不同校验和的块。
你可以查看Go标准库中的crypto
包来计算哈希值。
如果文件的变化是对文本文件进行行插入/删除(比如对源代码的更改),那么也许一个差异算法可以帮助你找到差异。类似于https://github.com/octavore/delta这样的工具。
英文:
It depends on how the files change.
If the files are append-only, then you only need to record the last offset where you stopped indexing, and start from there.
If the changes can happen anywhere, and the changes are mostly replacing old bytes with new bytes (like changing pixels of an image), then perhaps you can consider computing checksum for small chucks, and only index those chunks that has different checksums.
You can check out crypto
package in Go standard library for computing hashes.
If the changes are line insertion/deletion to text files (like changes to source code), then maybe a diff algorithm can help you find the differences. Something like https://github.com/octavore/delta.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论