英文:
Python multiprocessing, logging to different files
问题
我想在 n 个进程上运行一段代码,并将每个进程的日志记录到单独的文件中。
我尝试了一个简单的方法,类似于这样:
from multiprocessing import Process
import logging
class Worker(Process):
def __init__(self, logger_name, log_file):
super().__init__()
self.logger = logging.getLogger(logger_name)
self.log_file = log_file
self.logger.addHandler(logging.FileHandler(log_file))
print("from init", self.logger, self.logger.handlers)
def run(self) -> None:
print("from run", self.logger, self.logger.handlers)
if __name__ == '__main__':
p1 = Worker("l1", "log1")
p1.start()
(在 Python 3.9 和 3.11 中尝试过)
但出于某种原因,处理程序消失了。这是输出结果:
from init <Logger l1 (WARNING)> [<FileHandler log1 (NOTSET)>]
from run <Logger l1 (WARNING)> []
为什么 FileHandler 消失了?我应该在 run
方法内使用 AddHandler 吗——这样做正确吗?
我尝试使用这个答案,但无法真正让它工作。
目前,我通过在 run
中定义处理程序来解决了这个问题,但对我来说似乎是一个不太干净的破解...
更新:这种情况发生在我的 MacBook 上的 Python 安装中。在 Linux 服务器上,我无法复现这个问题。非常令人困惑。
无论哪种情况,问题可能是:
“这是记录到文件的正确方式,对于一个进程的多个副本吗?”
英文:
I would like to run a code on n processes, and have the logs from each process in a separate file.
I tried, naively, sthing like this
from multiprocessing import Process
import logging
class Worker(Process):
def __init__(self, logger_name, log_file):
super().__init__()
self.logger = logging.getLogger(logger_name)
self.log_file = log_file
self.logger.addHandler(logging.FileHandler(log_file))
print("from init", self.logger, self.logger.handlers)
def run(self) -> None:
print("from run", self.logger, self.logger.handlers)
if __name__ == '__main__':
p1 = Worker("l1", "log1")
p1.start()
(tried in python 3.9 and 3.11)
but from some reason, the handler is gone. This is the output:
from init <Logger l1 (WARNING)> [<FileHandler log1 (NOTSET)>]
from run <Logger l1 (WARNING)> []
Why is the FileHandler gone? Should I use the AddHandler within the run
method -- is it a correct way?
I was trying to use this answer but couldn't make it really work.
For the moment, I solved it via defining the handlers in run
but it seems like a dirty hack to me...
UPDATE: This happens on my MacBook python installations. On a linux server, I couldn't reproduce this. Very confusing.
In either case, the question is probably:
"Is this the correct way to log to files, with several copies of one
process?"
答案1
得分: 2
我找到了观察到的行为原因。它与在进程之间传输对象时的对象序列化有关。
在标准库的Logger
实现中,定义了一个__reduce__
方法。在对象无法可靠序列化的情况下,该方法用于处理。它不尝试序列化对象本身,而是使用__reduce__
返回的值作为序列化协议的结果。对于Logger,__reduce__
返回一个函数名(getLogger)和一个字符串(正在序列化的Logger的名称)作为参数。在反序列化过程中,反序列化协议进行了函数调用(logging.getLogger(name)
);该函数调用的结果成为已反序列化的Logger实例。
原始Logger和已反序列化的Logger将具有相同的名称,但可能没有其他共同之处。已反序列化的Logger将具有默认配置,而原始Logger将具有您可能进行的任何自定义配置。
在Python中,进程对象不共享地址空间(至少在Windows上不是这样)。启动新进程时,必须以某种方式将其实例变量“传输”到另一个进程。这是通过序列化/反序列化来完成的。在示例代码中,在Worker.__init__
函数中声明的实例变量确实出现在新进程中,您可以通过在Worker.run
中打印它们来进行验证。但在Python底层,实际上已经序列化和反序列化了所有实例变量,以使它们看起来像是神奇地迁移到了新进程。在绝大多数情况下,这样做是完全正常的。但如果其中一个实例变量定义了__reduce__
方法,则不一定会奏效。
我怀疑logging.FileHandler
无法序列化,因为它使用操作系统资源(文件)。这可能是Logger
对象无法序列化的原因之一。
英文:
I found the reason for the observed behavior. It has to do with pickling of objects when they are transferred between Processes.
In the standard library's implementation of Logger
, a __reduce__
method is defined. This method is used in cases where an object cannot be reliably pickled. Instead of trying to pickle the object itself, the pickle protocol instead uses the returned value from __reduce__
. In the case of Logger, __reduce__
returns a function name (getLogger) and a string (the name of the Logger being pickled) to be used as an argument. In the unpicking procedure, the unpickling protocol makes a function call (logging.getLogger(name)
); the result of that function call becomes the unpickled Logger instance.
The original Logger and the unpickled Logger will have the same name, but perhaps not much else in common. The unpickled Logger will have the default configuration, whereas the original Logger will have any customization you may have performed.
In Python, Process objects do not share an address space (at least, not on Windows). When a new Process is launched, its instance variables must somehow be "transferred" from one Process to another. This is done by pickling/unpickling. In the example code, the instance variables declared in the Worker.__init__
function do indeed appear in the new Process, as you can verify by printing them in Worker.run
. But under the hood Python has actually pickled and unpickled all of the instance variables, to make it look like they magically have migrated to the new Process. In the vast majority of cases this works just fine. But not necessarily if one of those instance variables defines a __reduce__
method.
A logging.FileHandler
cannot, I suspect, be pickled since it uses operating system resources (a file). This is probably the reason (or at least one of the reasons) why Logger
objects can't be pickled.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论