Optimizing logging in C++ multi-threaded server app for performance and crash-proof data integrity.

huangapple go评论73阅读模式
英文:

Optimizing logging in C++ multi-threaded server app for performance and crash-proof data integrity

问题

我有一个复杂的服务器应用程序(Windows/Linux),有多个线程,我需要偶尔启用详细日志记录以追踪错误。然而,在我当前的实现中,每个日志条目都会立即刷新到磁盘,这会显著减慢应用程序的运行速度。

我想找到一种方法来摆脱立即刷新的方式,但我不确定是否有其他选项可以保证在应用程序崩溃时将所有报告的日志条目写入文件。

我正在考虑的一个选项是为日志记录运行一个单独的线程。在这种方法中,其他线程只会在等待将其日志条目插入线程安全队列时经历速度减慢。然而,我不确定是否可能确保队列中的所有条目在应用程序在主线程中崩溃时都写入磁盘。Linux信号或Windows的SetUnhandledExceptionFilter是否可以以一种允许这样的线程完成其工作的方式使用?

另一个考虑的观点是将日志条目通过UDP/TCP回环发送到另一个进程(例如,Linux上的syslog)。然而,我不确定这种方法是否会提供任何速度增益,或者是否在应用程序崩溃时始终发送套接字缓冲区。此外,实施这种解决方案肯定会增加运行应用程序所需的基础架构的复杂性,特别是在Windows上。

英文:

I have a complex server application (Windows/Linux) with several threads, and I need to enable verbose logging occasionally to trace a bug. However, the logging process dramatically slows down the application because, in my current implementation, each log entry is flushed to disk immediately.

I'd like to find a way to get rid of immediate flushing, but I'm uncertain if there are any other options that can guarantee all reported log entries are written to file if the application crashes.

One option I'm considering is running a separate thread for logging. In this approach, other threads would only experience a slowdown while waiting to insert their log entries into a thread-safe queue. However, I'm unsure if it's possible to ensure all entries in the queue are written to disk if the application crashes in the main thread. Can Linux signals or Windows' SetUnhandledExceptionFilter be used in a way that allows such a thread to complete its job?

The other considered opinion is to send log entries over UDP/TCP loopback to another process (e.g., syslog on Linux). However, I'm not sure if this approach would provide any speed gains or if the socket buffers are always sent if the application crashes. Additionally, implementing this solution would definitely increase the complexity of the infrastructure needed for running the application, especially on Windows.

答案1

得分: 2

以下是翻译好的内容:

要求

如果我正确理解您的要求,有一个应用程序,其中有多个线程记录到单个存档中(例如文件)。并且,即使程序意外终止,所有已完成的记录调用中的信息都必须成功存档。

我认为实现捕获存档中的所有信息的严格要求的唯一方法是强制同步记录。如果写入不是同步的,总会存在一段时间,其中可能会丢失一些信息。如果严格执行,这将排除使用专用线程写入存档的选项。

想法

据我所知,实现这一目标并具有可接受的性能的最佳(唯一?)技术是使用某种内存映射方案(正如@N00byEdge提出的)。我汇总了一些现有的代码来构建这个方案的原型,以进行一些测量。

另一个选项是尝试高性能记录器,比如spdlog,它非常强大和高性能。我用它来记录。然而,如果每条记录消息都要刷新,它仍然比内存映射文件慢两个数量级(100倍)。

性能结果

根据您的帖子,我构建了一个简单的性能测试,用于测量将200 Mb的日志信息写入文件的时间,使用简单地喷射信息的编写器,我认为这是最坏的情况。

以下图表显示了记录200 Mb信息所需的时间与记录线程数量的关系。我在Arm64x86上测量了三个不同的原型:

  • 使用自旋锁进行同步的MemoryFile
  • 使用互斥锁进行同步的MemoryFile
  • spdlog,但数据量少了100倍

Optimizing logging in C++ multi-threaded server app for performance and crash-proof data integrity.
Optimizing logging in C++ multi-threaded server app for performance and crash-proof data integrity.

如果有兴趣,我不介意为像这样具有严格数据保证和性能要求的情况工作,创建一个更通用的基于内存映射的"文件"的开源实现。

示例代码(MemoryFile

#include <atomic>
#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <filesystem>
#include <thread>
#include <vector>
#include <iostream>
#include <chrono>

// ...(这部分代码太长,请告诉我您是否需要继续翻译)...

示例代码(spdlog

#include <iostream>
#include <filesystem>
#include "spdlog/spdlog.h"
#include "spdlog/sinks/basic_file_sink.h"

// ...(这部分代码太长,请告诉我您是否需要继续翻译)...

请告诉我是否需要进一步翻译示例代码的其余部分。

英文:

Requirements

If I understand your requirements correctly, there is an application
with multiple threads logging to a single archive of some type (e.g. a
file). And, all of the information from completed calls to log must be
successfully archived even when the program terminates unexpectedly.

I believe the only way to achieve the strict requirement of capturing
all of the informaiton in an achive is to enforce synchonous
logging. If the writes are not synchronous, there will always be a
window in which some information can be lost. If strictly enforced,
this excludes options such as using a dedicated thread for writing to
the archive.

Ideas

As far as I know, the best (only?) technique for achieving this goal
with acceptable performance is to use some type of memory-mapping
scheme (like proposed by @N00byEdge). I pulled together some existing
code to construct enough of a prototype of this scheme to do some
measurements.

Another option is to try a high-performance logger like
spdlog which is very capable and
performant. I use it for my logging. If you flush on each log message,
however, it will still be two orders of magnitude (100x) slower than a
memory-mapped file.

Performance Results

Based on your post, I constructed a simple performance test that
measures the time to write 200 Mb of log information to a file using
writers that simply spew information, which I think is the worst case
scenario.

The following plots show the elapsed time to log 200 Mb of information
versus the number of logging threads. I measured three different
prototypes across Arm64 and x86:

  • MemoryFile with a spin-lock for synchronization
  • MemoryFile with a mutex for synchronization
  • spdlog but with 100x less data

Optimizing logging in C++ multi-threaded server app for performance and crash-proof data integrity.
Optimizing logging in C++ multi-threaded server app for performance and crash-proof data integrity.

If there is interest, I wouldn't mind working on an open source
implementation of a more generic memory-mapped based "file" for use
in situations like this that have strict data guaruntees and
performance requirements.

Sample Code (MemoryFile)

#include <atomic>
#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <filesystem>
#include <thread>
#include <vector>
#include <iostream>
#include <chrono>

class SpinLock {
public:
    void lock() {
        while (locked_.test_and_set(std::memory_order_acquire))
            std::this_thread::yield();
    }

    void unlock() {
        locked_.clear(std::memory_order_release);
    }

private:
    std::atomic_flag locked_ = ATOMIC_FLAG_INIT;
};

class SpinLockGuard {
public:
    SpinLockGuard(SpinLock& lock)
        : lock_(lock) {
        lock_.lock();
    }

    ~SpinLockGuard() {
        lock_.unlock();
    }

private:
    SpinLock& lock_;
};

class MemoryFile {
public:
    enum Mode { CREATE, OPEN, CREATE_OR_OPEN };

    struct Header {
        size_t bytes_written{}, bytes_mapped{};
    };

    static constexpr size_t DataOffset = sizeof(Header);

    MemoryFile(const std::string& filename, Mode mode) {
        switch (mode) {
        case CREATE:
            create_chunk(filename);
            break;
        case OPEN:
            open_chunk(filename);
            break;
        case CREATE_OR_OPEN:
            if (std::filesystem::exists(filename)) open_chunk(filename);
            else create_chunk(filename);
            break;
        }
    }

    ~MemoryFile() {
        if (base_)
            munmap(base_, *ptr_bytes_mapped_);
        if (fd_ != -1)
            close(fd_);
    }

    void commit_message(std::string_view msg) {
        // SpinLockGuard guard(lock_);
        std::lock_guard guard(mutex_);
        ensure_room_for_message(msg.size());
        auto nbytes = *ptr_bytes_written_;
        std::copy(msg.begin(), msg.end(), base_ + nbytes);
        nbytes += msg.size();
        *ptr_bytes_written_ = nbytes;
    }

private:
    size_t grow_file(size_t new_size) {
        auto rc = lseek(fd_, new_size - 1, SEEK_SET);
        if (rc + 1 != new_size)
            throw std::runtime_error("error: lseek failed trying to grow file");

        rc = write(fd_, "", 1);
        if (rc != 1)
            throw std::runtime_error("error: write to extend file failed");

        return new_size;
    }

    void map_memory(size_t nbytes) {
        base_ = (char*)mmap(nullptr, nbytes, PROT_READ | PROT_WRITE, MAP_SHARED, fd_, 0);
        if (base_ == nullptr)
            throw std::runtime_error("error: failed to map memory");

        Header *header = (Header*)base_;
        ptr_bytes_written_ = &(header->bytes_written);
        ptr_bytes_mapped_ = &(header->bytes_mapped);
        *ptr_bytes_mapped_ = nbytes;
    }

    void create_chunk(const std::string& filename) {
        namespace fs = std::filesystem;
        if (fs::exists(fs::path(filename)))
            throw std::runtime_error("error: chunk already exists");

        fd_ = open(filename.c_str(), O_CREAT | O_RDWR, 0777);
        if (fd_ == -1)
            throw std::runtime_error("error: failed to create chunk");

        auto nbytes = grow_file(1024 * 1024);
        map_memory(nbytes);
        *ptr_bytes_written_ = sizeof(Header);
    }

    void open_chunk(const std::string& filename) {
        namespace fs = std::filesystem;
        if (not fs::exists(fs::path(filename)))
            throw std::runtime_error("error: chunk does not exist");

        fd_ = open(filename.c_str(), O_RDWR, 0777);
        if (fd_ == -1)
            throw std::runtime_error("error: failed to open chunk");

        auto nbytes = fs::file_size(fs::path(filename));
        map_memory(nbytes);
    }

    void ensure_room_for_message(size_t n) {
        auto nw = *ptr_bytes_written_, nm = *ptr_bytes_mapped_;
        if (nw + n >= nm) {
            munmap(base_, nm);
            base_ = nullptr;

            auto new_size = std::max(2 * nm, nw + n + 1);
            auto nbytes = grow_file(new_size);
            map_memory(nbytes);
        }
    }

    // Pointer to the number of bytes written.
    volatile size_t *ptr_bytes_written_{nullptr};

    // Pointer to the number of bytes in the mapped region.
    volatile size_t *ptr_bytes_mapped_{nullptr};

    // Pointer to the mapped region.
    char *base_{nullptr};

    // File descriptor of the mapped region.
    int fd_{-1};

    // Use simple spin-lock for thread synchronization.
    // SpinLock lock_;

    // Use a mutex for thread synchronization.
    std::mutex mutex_;
};

template<class Clock = std::chrono::high_resolution_clock>
class StopWatch
{
public:
    StopWatch()
        : start_tp_(Clock::now())
        , last_tp_(start_tp_)
    { }

    auto now() const
    {
        std::atomic_thread_fence(std::memory_order_relaxed);
        auto current_tp = Clock::now();
        std::atomic_thread_fence(std::memory_order_relaxed);
        return current_tp;
    }

    auto mark()
    {
        auto current_tp = now();
        auto elapsed = current_tp - last_tp_;
        last_tp_ = current_tp;
        return elapsed;
    }

    template<class Units = typename Clock::duration>
    auto elapsed_duration()
    {
        auto elapsed = mark();
        return std::chrono::duration_cast<Units>(elapsed);
    }

    template<class Units = typename Clock::duration>
    auto elapsed_time()
    {
        auto elapsed = mark();
        return std::chrono::duration_cast<Units>(elapsed).count();
    }

private:
    typename Clock::time_point start_tp_;
    typename Clock::time_point last_tp_;
};

int main(int argc, const char *argv[]) {
    const auto filename = "/tmp/x.out";
    size_t nth = argc >= 2 ? atoi(argv[1]) : 1;
    size_t nmsg = argc >= 3 ? atoi(argv[2]) : 1'000'000;
    size_t msg_per_thread = nmsg / nth;

    std::filesystem::remove(std::filesystem::path(filename));
    MemoryFile log(filename, MemoryFile::CREATE);

    StopWatch timer;
    timer.mark();
    std::vector<std::thread> threads;
    for (auto i = 0; i < nth; ++i)
        threads.emplace_back([&]() {
            for (auto i = 0; i < msg_per_thread; ++i)
                log.commit_message("This is a log message\n");
        });

    for (auto& thread : threads)
        thread.join();

    auto ms = timer.elapsed_duration<std::chrono::milliseconds>().count();
    std::cout << ms << " ms" << std::endl;

    return 0;
}

Sample Code (spdlog)

#include <iostream>
#include <filesystem>
#include "spdlog/spdlog.h"
#include "spdlog/sinks/basic_file_sink.h"

template<class Clock = std::chrono::high_resolution_clock>
class StopWatch
{
public:
    StopWatch()
        : start_tp_(Clock::now())
        , last_tp_(start_tp_)
    { }

    auto now() const
    {
        std::atomic_thread_fence(std::memory_order_relaxed);
        auto current_tp = Clock::now();
        std::atomic_thread_fence(std::memory_order_relaxed);
        return current_tp;
    }

    auto mark()
    {
        auto current_tp = now();
        auto elapsed = current_tp - last_tp_;
        last_tp_ = current_tp;
        return elapsed;
    }

    template<class Units = typename Clock::duration>
    auto elapsed_duration()
    {
        auto elapsed = mark();
        return std::chrono::duration_cast<Units>(elapsed);
    }

    template<class Units = typename Clock::duration>
    auto elapsed_time()
    {
        auto elapsed = mark();
        return std::chrono::duration_cast<Units>(elapsed).count();
    }

private:
    typename Clock::time_point start_tp_;
    typename Clock::time_point last_tp_;
};

using std::cout, std::endl;

int main(int argc, const char *argv[]) {
    const auto filename = "/tmp/x.out";
    size_t nth = argc >= 2 ? atoi(argv[1]) : 1;
    size_t nmsg = argc >= 3 ? atoi(argv[2]) : 1'000'000;
    size_t msg_per_thread = nmsg / nth;

    std::filesystem::remove(std::filesystem::path(filename));
    auto sink = std::make_shared<spdlog::sinks::basic_file_sink_mt>("/tmp/x.out", true);
    spdlog::logger logger("basic", sink);

    StopWatch timer;
    timer.mark();
    std::vector<std::thread> threads;
    for (auto i = 0; i < nth; ++i)
        threads.emplace_back([&]() {
            for (auto i = 0; i < msg_per_thread; ++i) {
                logger.info("This is a log message\n");
                logger.flush();
            }
        });

    for (auto& thread : threads)
        thread.join();

    auto ms = timer.elapsed_duration<std::chrono::milliseconds>().count();
    std::cout << ms << " ms" << std::endl;

    return 0;
}

答案2

得分: 1

如果您有数据量的上限,可以按以下方式操作:

创建一个文件,并将其映射到内存中,例如在Linux上使用MAP_SHARED。如果您的程序崩溃,更改将写回文件。您的所有日志记录将只是一个简单的memcpy操作。这将具有高性能,可以在崩溃时保留数据,但需要您设置文件映射的上限(并在映射之前调整大小)。您还可以将文件用作环形缓冲区,并在需要时重新开始,如果您可以接受覆盖旧的日志条目的话。

英文:

If you have an upper bound of the amount of data I would do it in the following way:

Create a file and map it into memory with for example MAP_SHARED on linux. If your program dies, the changes should be written back to file. All your logging will do to output data will just be a single memcpy. It will be highly performant, keep your data on a crash, but requires you to have an upper bound for the mapping (and resize before mapping) of the file. You could also use the file as a ring buffer and restart at the beginning if you're okay with overwriting old log entries.

答案3

得分: 0

以下是翻译好的内容:

有一种方法可以做到这一点,那就是构建一个类似于现有记录器的记录器,具有以下特点:

  • 每条消息的严重程度,因此某些消息是“错误”,某些消息是“跟踪”,等等
  • 可配置的根据严重程度刷新,比如“仅在错误时刷新”,可能与定期刷新结合使用。

如果您从多个线程记录日志,无论如何,您希望您的日志记录是线程安全的,因此线程安全队列和一个消费者线程,该线程出队并写入此数据,是一个好主意。

避免减慢刷新的另一种技巧是将日志文件写入RAM磁盘,例如大多数Linux发行版上的“/tmp”。

结果是一个根据特定严重程度刷新的记录器。此严重程度可以进行调整,因此例如在生产部署系统上,您只会在出现错误时进行刷新,但在开发环境中,您会在调试消息上进行刷新(如果速度不太慢的话)。

这种方法效果很好,特别是如果您采用了在出现意外值时记录错误的策略(例如,您可以检查预先和后置条件,如果它们失败,则记录错误)。

英文:

One way this can be done is to build a logger similar to existing loggers, which have:

  • Severity per message, so some messages are error, some are trace, and so on
  • Configurable flushing per severity, such as "only flush on error", possibly combined with a periodic flush.

If you're logging from multiple threads, you want your logging to be thread-safe anyway, so a thread safe queue and a consumer thread which dequeues and writes this data is a good idea.

Another trick to avoid slowdown of flushes is to write your log files to a ram disk, like /tmp on most Linux distros.

The result is a logger that flushes on specific severities. This severity can be adjusted, so, for example, on production-deployed systems you'd only flush on error, but in development environments, you'd flush on debug messages (if that's not too slow).

This works well, especially if you adopt the strategy of logging an error on unexpected values (e.g. you could check your pre- and post-conditions and log an error if they fail).

huangapple
  • 本文由 发表于 2023年5月11日 16:50:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/76225774.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定