2023年2月18日 22:00:33go评论109阅读模式

英文:

Python multiprocessing, logging to different files

问题

我想在 n 个进程上运行一段代码，并将每个进程的日志记录到单独的文件中。

我尝试了一个简单的方法，类似于这样：

from multiprocessing import Process
import logging

class Worker(Process):
    def __init__(self, logger_name, log_file):
        super().__init__()
        self.logger = logging.getLogger(logger_name)
        self.log_file = log_file
        self.logger.addHandler(logging.FileHandler(log_file))
        print("from init", self.logger, self.logger.handlers)

    def run(self) -> None:
        print("from run", self.logger, self.logger.handlers)

if __name__ == '__main__':
    p1 = Worker("l1", "log1")
    p1.start()

（在 Python 3.9 和 3.11 中尝试过）
但出于某种原因，处理程序消失了。这是输出结果：

from init <Logger l1 (WARNING)> [<FileHandler log1 (NOTSET)>]
from run <Logger l1 (WARNING)> []

为什么 FileHandler 消失了？我应该在 run 方法内使用 AddHandler 吗——这样做正确吗？

我尝试使用这个答案，但无法真正让它工作。

目前，我通过在 run 中定义处理程序来解决了这个问题，但对我来说似乎是一个不太干净的破解...

更新：这种情况发生在我的 MacBook 上的 Python 安装中。在 Linux 服务器上，我无法复现这个问题。非常令人困惑。

无论哪种情况，问题可能是：

“这是记录到文件的正确方式，对于一个进程的多个副本吗？”

英文:

I would like to run a code on n processes, and have the logs from each process in a separate file.

I tried, naively, sthing like this

from multiprocessing import Process
import logging


class Worker(Process):
    def __init__(self, logger_name, log_file):
        super().__init__()
        self.logger = logging.getLogger(logger_name)
        self.log_file = log_file
        self.logger.addHandler(logging.FileHandler(log_file))
        print(&quot;from init&quot;, self.logger, self.logger.handlers)

    def run(self) -&gt; None:
        print(&quot;from run&quot;, self.logger, self.logger.handlers)


if __name__ == &#39;__main__&#39;:
    p1 = Worker(&quot;l1&quot;, &quot;log1&quot;)
    p1.start()

(tried in python 3.9 and 3.11)
but from some reason, the handler is gone. This is the output:

from init &lt;Logger l1 (WARNING)&gt; [&lt;FileHandler log1 (NOTSET)&gt;]
from run &lt;Logger l1 (WARNING)&gt; []

Why is the FileHandler gone? Should I use the AddHandler within the run method -- is it a correct way?

I was trying to use this answer but couldn't make it really work.

For the moment, I solved it via defining the handlers in run but it seems like a dirty hack to me...

UPDATE: This happens on my MacBook python installations. On a linux server, I couldn't reproduce this. Very confusing.

In either case, the question is probably:

"Is this the correct way to log to files, with several copies of one
process?"

答案1

得分: 2

我找到了观察到的行为原因。它与在进程之间传输对象时的对象序列化有关。

在标准库的Logger实现中，定义了一个__reduce__方法。在对象无法可靠序列化的情况下，该方法用于处理。它不尝试序列化对象本身，而是使用__reduce__返回的值作为序列化协议的结果。对于Logger，__reduce__返回一个函数名（getLogger）和一个字符串（正在序列化的Logger的名称）作为参数。在反序列化过程中，反序列化协议进行了函数调用（logging.getLogger(name)）；该函数调用的结果成为已反序列化的Logger实例。

原始Logger和已反序列化的Logger将具有相同的名称，但可能没有其他共同之处。已反序列化的Logger将具有默认配置，而原始Logger将具有您可能进行的任何自定义配置。

在Python中，进程对象不共享地址空间（至少在Windows上不是这样）。启动新进程时，必须以某种方式将其实例变量“传输”到另一个进程。这是通过序列化/反序列化来完成的。在示例代码中，在Worker.__init__函数中声明的实例变量确实出现在新进程中，您可以通过在Worker.run中打印它们来进行验证。但在Python底层，实际上已经序列化和反序列化了所有实例变量，以使它们看起来像是神奇地迁移到了新进程。在绝大多数情况下，这样做是完全正常的。但如果其中一个实例变量定义了__reduce__方法，则不一定会奏效。

我怀疑logging.FileHandler无法序列化，因为它使用操作系统资源（文件）。这可能是Logger对象无法序列化的原因之一。

英文:

I found the reason for the observed behavior. It has to do with pickling of objects when they are transferred between Processes.

In the standard library's implementation of Logger, a __reduce__ method is defined. This method is used in cases where an object cannot be reliably pickled. Instead of trying to pickle the object itself, the pickle protocol instead uses the returned value from __reduce__. In the case of Logger, __reduce__ returns a function name (getLogger) and a string (the name of the Logger being pickled) to be used as an argument. In the unpicking procedure, the unpickling protocol makes a function call (logging.getLogger(name)); the result of that function call becomes the unpickled Logger instance.

The original Logger and the unpickled Logger will have the same name, but perhaps not much else in common. The unpickled Logger will have the default configuration, whereas the original Logger will have any customization you may have performed.

In Python, Process objects do not share an address space (at least, not on Windows). When a new Process is launched, its instance variables must somehow be "transferred" from one Process to another. This is done by pickling/unpickling. In the example code, the instance variables declared in the Worker.__init__ function do indeed appear in the new Process, as you can verify by printing them in Worker.run. But under the hood Python has actually pickled and unpickled all of the instance variables, to make it look like they magically have migrated to the new Process. In the vast majority of cases this works just fine. But not necessarily if one of those instance variables defines a __reduce__ method.

A logging.FileHandler cannot, I suspect, be pickled since it uses operating system resources (a file). This is probably the reason (or at least one of the reasons) why Logger objects can't be pickled.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python多进程，记录到不同的文件。

问题

答案1

Python Logging from Watchdog Thread

为什么我的代码的并行版本比串行版本运行得更慢？

如何在可重用的包中实现日志记录？

Google Cloud 无法与 Gunicorn 配合使用。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论