英文:
Making a journal file in golang
问题
我有一个使用Go语言编写的小项目,它通过tcp接收文本行进行处理。然而,为了确保鲁棒性,我想创建一种类似日志的机制,以防止在断电的情况下丢失数据(例如,我的应用程序接收到一帧数据,但尚未处理)。
我已经在谷歌上搜索了有关如何实现日志文件的指南,但搜索结果中有很多Oracle RDBMS文档等无关内容。
我的想法是:在接收到一行数据后,立即将其写入一个文件,并设置一个“未处理标志”。在处理完毕后,更新文件以清除该标志,允许覆盖写入。同时,在清除该标志的同时,向数据发送方发送一个“处理确认”。也许在日志中使用固定大小的“槽位”会更容易,以确保可以重用已释放的槽位,而不是不断增长的文件,并维护一个“空闲列表”来跟踪未使用的槽位。
在自定义代码中实现这种文件的“最佳实践”是什么?例如文件结构、填充和锁定方面有什么需要注意的问题?在Go语言中这样做是否存在问题,因为它是跨平台的,而不是使用本地文件系统API?
英文:
I have a small project in Go that are receiving text lines over tcp to process. However, to ensure robustness, I want to create some sort of journal so that nothing is lost in case of power failure (e.g. a frame of data is received by my app, but is not yet processed).
I have googled for any guides on how a journal file should be implemented, but the search results are heavily polluted by Oracle RDBMS documentation and such.
My tought was something like: immediately after receiving a line, write it to a file with a "not processed flag". After processing, update the file so that this flag is cleared, opening for overwrites. At the same time as this flag is cleared, send an "processed ack" to the data sender. Perhaps its easiest to deal with fixed size "slots" in the journal to ensure that I can reuse freed slots rather than having a ever-increasing file and maintain a "free list" of unused slots.
Is there any "best practice" for implementing such files in custom code, i.g.e with regards to file structure, padding and locking? Are there any concerns doing so in Go as it is cross-platform rather than using native file-system APIs?
答案1
得分: 5
你不应该重写日志。只需将操作追加到日志中,以便可以重新创建它们,然后控制所需的严格程度。
逻辑应该简单地是:
-
接收消息。
-
将其写入日志。
-
根据一致性要求,可选择立即对日志执行fsync操作。
-
可选择发送“接收确认” - 根据需求。
-
处理消息。
-
可选择将另一个带有记录ID的“已处理”记录写入文件。不一定总是需要这个,但这是你不重写旧记录的地方。或者,您可以写一个单独的文件,其中包含您已处理的“顶部事务ID”,这样在发生故障时,您将自动知道从哪里开始重新处理。这将减小日志的大小。
-
发送“处理确认”或“处理失败” - 再次取决于您的需求。
数据库通常允许您控制fsync行为 - 每次写入、每隔N秒、当操作系统决定时 - 这是速度与持久性之间的权衡。
关于这个主题,你可以阅读一篇关于Redis持久性的好文章:
http://oldblog.antirez.com/post/redis-persistence-demystified.html
[编辑] 关于这个主题的另一篇很棒的文章 - http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
至于Go方面 - 有几种选项可以写入文件,从低级文件处理器到缓冲写入器。当然,文件处理器会让你更多地控制底层发生的情况。我不确定Go中的普通文件写入器在幕后有多少缓存,如果你打算使用它,建议你阅读代码。
英文:
You shouldn't rewrite a journal. Just append the operations to it so that you can recreate them, and then control the strictness level you want.
The logic should simply be:
-
receive message
-
write it to journal
-
optionally do an fsync on the journal now - depending on your consistency requirements.
-
optionally then send a "received ack" - depends on your needs.
-
process the message.
-
optionally write another "processed" record to the file with an id of the record. you don't always need that but this where you don't rewrite the old record. Alternatively you can write a separate file with the "top transaction id" you've processed, so you'll automatically know where to begin processing again in case of a failure. this will reduce the journal size.
-
send a "processed ack" or "processing failure" - again, depends on what you want.
Databases usually let you control the fsync behavior - every write, every N seconds, when the os decides - it's a matter of speed vs. durability.
A good read on the subject might be this post on redis persistence:
http://oldblog.antirez.com/post/redis-persistence-demystified.html
[EDIT] another great read on the subject - http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
As for the Go aspect of it - there are a few options of writing to files, from a low level file handler to a buffered writer. Of course a file handler will keep you most in control of what's going on under the hood. I'm not sure how much caching behind the scenes a normal file writer in Go does, I'd suggest you read the code if you intend to use it.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论