C# – 截断大型事务日志文件的开头

huangapple go评论69阅读模式
英文:

C# - Truncate beginning of large transaction log file

问题

我使用文本文件来存储交易。 这个交易日志实际上是持久性机制。 软件引导时,它将重播交易并返回到上次已知的状态。 还有其他事情,比如快照,在快照后加载交易(以阻止从时间的开始重播),归档和清除。

这些交易日志可以变得非常庞大。 尤其是当公司想要保留一个月的交易时。 存档在启动时清除旧的交易,然后每天午夜(拍摄快照,存档交易,然后清除旧的交易)。

用于清除的算法是打开两个文件流; 一个用于当前文件,另一个用于创建的临时文件。 我一次只向临时文件流传送一个交易,只写入我想要的交易。 然后我删除当前文件,并将临时文件重命名为当前文件。 这种方法节省了内存,但对于接近500MB的文件性能成为了一个问题。

交易按从顶部最旧到底部最新的顺序存储。 我想做的是一次删除一行,直到找到需要保留的交易,然后停止处理。 有没有办法做到这一点? 以下是当前的方法:

await _semaphore.WaitAsync().ConfigureAwait(false);
try
{
    using var reader = _fileSystem.OpenStream(originalFile);
    using var writer = _fileSystem.CreateStream(tempFile);
    string? line = null;
    while ((line = await reader.ReadLineAsync().ConfigureAwait(false)) is not null)
    {
        var logItem = _serializer.Deserialize<TransactionLogItem>(line);
        var dateLogged = logItem.HappenedOn.ToLocalDateTime().Date;
        if (dateLogged >= oldestAllowedDate) await writer.WriteLineAsync(line).ConfigureAwait(false);
    }
}
finally
{
    _fileSystem.DeleteFile(originalFile);
    _fileSystem.Rename(tempFile, originalFile);
    _semaphore.Release();
}
英文:

I'm using a text file as a way to store transactions. This transaction log is essentially the persistence mechanism. When the software bootstraps it will replay the transactions and get back to the last known state. There are other things like snapshots, loading transactions after the snapshot was taken (to stop from replaying from the beginning of time), archiving, and purging.

These transaction logs can get really large. This is especially true when a company wants to keep a month's worth of transactions. The archive is purged of old transactions at startup and then every midnight (snapshots taken, transactions archived, and then old ones purged).

The algorithm used to purge is to open two file streams; one for the current file and another for a temp file that gets created. I stream one transaction at a time to the temporary file only writing the ones I want. Then I delete the current file and rename the temp file to be the current file. This approach saves on RAM but performance becomes a problem for files approaching 500mb.

The transactions are stored oldest on the top to newest on the bottom. What I would like to do is remove one line at a time until I find a transaction that needs to stay and then stop processing. Is there a way to do that? Below is the current approach:

await _semaphore.WaitAsync().ConfigureAwait(false);
try
{
    using var reader = _fileSystem.OpenStream(originalFile);
    using var writer = _fileSystem.CreateStream(tempFile);
    string? line = null;
    while ((line = await reader.ReadLineAsync().ConfigureAwait(false)) is not null)
    {
        var logItem = _serializer.Deserialize&lt;TransactionLogItem&gt;(line);
        var dateLogged = logItem.HappenedOn.ToLocalDateTime().Date;
        if (dateLogged &gt;= oldestAllowedDate) await writer.WriteLineAsync(line).ConfigureAwait(false);
    }
}
finally
{
    _fileSystem.DeleteFile(originalFile);
    _fileSystem.Rename(tempFile, originalFile);
    _semaphore.Release();
}

答案1

得分: 1

没有,没有文件系统支持截断文件的开头。你可以尝试使用稀疏文件,在某些文件系统上可能会有一些效果,但在哪些部分可以变得稀疏上,它们的支持比较有限。

你最好的选择是要么继续现在的操作,使用一个真正的数据库,或者保留多个事务日志,这样你可以轻松删除旧的日志。

英文:

> Is there a way to do that?

No; no file system supports truncating the beginning of a file. You can kinda get something working maybe with sparse files, but they only work on some filesystems and are pretty coarse in what sections can be made sparse.

Your best bet is to either do what you're doing now, use a real database, or have multiple transaction logs so you can just delete old ones.

huangapple
  • 本文由 发表于 2023年2月10日 10:29:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/75406394.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定