英文:
Using multithreading with time.sleep and unique loggers
问题
I'm trying to make sure that several threads start as close to each other as possible, and for that I'm using time.sleep
. Each thread will have its own logger, which will output to its unique file.
有一些线程需要尽可能接近同时启动,为此我使用time.sleep
。每个线程都有自己的记录器,将输出到独特的文件。
There's something very strange happening though...
然而,发生了一些非常奇怪的事情...
- Sometimes, not all logger files are created. In the example below, instead of 4 worker files, sometimes I'll get 2, others 3. I don't see a pattern.
有时,并没有创建所有的记录文件。在下面的示例中,有时不是4个工作文件,而是2个或3个。我看不出规律。
Here's a minimal working example:
以下是一个最小工作示例:
################# Logger ########################
main_logger = logging.getLogger("main_logger")
main_logger.setLevel(logging.DEBUG)
file_handler = logging.FileHandler(
filename="./logs/print_multi.txt", mode="w"
)
file_handler.setLevel(logging.DEBUG)
formatter = logging.Formatter(
"%(asctime)s - %(threadName)s - %(name)s - %(levelname)s - %(message)s"
)
file_handler.setFormatter(formatter)
main_logger.addHandler(file_handler)
def print_multi(start_time: datetime, index):
# cleaning the worker directory
for path in Path("./logs/workers_print/").glob("**/*"):
if path.is_file():
path.unlink()
# creating logging
worker_logger = logging.getLogger(f"print_worker_{index}")
worker_logger.setLevel(logging.DEBUG)
file_handler = logging.FileHandler(
filename=f"./logs/workers_print/print_worker_{index}_ignore.txt",
mode="w",
)
file_handler.setLevel(logging.DEBUG)
formatter = logging.Formatter(
"%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
file_handler.setFormatter(formatter)
worker_logger.addHandler(file_handler)
# logging the times
worker_logger.debug(f"This thread will start at {start_time}")
time_now = datetime.now(tz=timezone.utc)
seconds_to_start = (start_time - time_now).total_seconds()
worker_logger.debug(f"seconds to start -> {seconds_to_start}")
time.sleep(seconds_to_start)
worker_logger.debug("We're in thread {index}")
print("We're in thread {index}")
def main():
main_logger.debug("Setting ThreadPoolExecutor")
start_time = datetime.now(tz=timezone.utc) + timedelta(seconds=10)
main_logger.debug(f"start_time -> {start_time}")
workers = 4 # os.cpu_count()
main_logger.debug(f"num_workers -> {workers}")
with concurrent.futures.ProcessPoolExecutor() as executor:
results = executor.map(
print_multi, [start_time] * workers, range(workers)
)
for r in results:
pass
main_logger.debug("Finish")
main()
Here's an example of a traceback, when I got only worker 2 and 3 files, instead of 0, 1, 2, 3:
以下是一个追踪的示例,当我只得到工作2和3的文件时,而不是0、1、2、3:
Traceback (most recent call last):
File "/usr/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/lib/python3.10/concurrent/futures/process.py", line 205, in _process_chunk
return [fn(*args) for args in chunk]
File "/usr/lib/python3.10/concurrent/futures/process.py", line 205, in <listcomp>
return [fn(*args) for args in chunk]
File "multithreading_MWE.py", line 72, in print_multi
path.unlink()
File "/usr/lib/python3.10/pathlib.py", line 1206, in unlink
self._accessor.unlink(self)
FileNotFoundError: [Errno 2] No such file or directory: 'logs/workers_print/print_worker_1_ignore.txt'
英文:
I'm trying to make sure that several threads start as close to each other as possible, and for that I'm using time.sleep
. Each thread will have its own logger, which will output to its unique file.
There's something very strange happening though...
- Sometimes, not all logger files are created. In the example below, instead of 4 worker files, sometimes I'll get 2, others 3. I don't see a pattern.
Here's a minimal working example:
################# Logger ########################
main_logger = logging.getLogger("main_logger")
main_logger.setLevel(logging.DEBUG)
file_handler = logging.FileHandler(
filename="./logs/print_multi.txt", mode="w"
)
file_handler.setLevel(logging.DEBUG)
formatter = logging.Formatter(
"%(asctime)s - %(threadName)s - %(name)s - %(levelname)s - %(message)s"
)
file_handler.setFormatter(formatter)
main_logger.addHandler(file_handler)
def print_multi(start_time: datetime, index):
# cleaning the worker directory
for path in Path("./logs/workers_print/").glob("**/*"):
if path.is_file():
path.unlink()
# creating logging
worker_logger = logging.getLogger(f"print_worker_{index}")
worker_logger.setLevel(logging.DEBUG)
file_handler = logging.FileHandler(
filename=f"./logs/workers_print/print_worker_{index}_ignore.txt",
mode="w",
)
file_handler.setLevel(logging.DEBUG)
formatter = logging.Formatter(
"%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
file_handler.setFormatter(formatter)
worker_logger.addHandler(file_handler)
# logging the times
worker_logger.debug(f"This thread will start at {start_time}")
time_now = datetime.now(tz=timezone.utc)
seconds_to_start = (start_time - time_now).total_seconds()
worker_logger.debug(f"seconds to start -> {seconds_to_start}")
time.sleep(seconds_to_start)
worker_logger.debug(f"We're in thread {index}")
print(f"We're in thread {index}")
def main():
main_logger.debug("Setting ThreadPoolExecuter")
start_time = datetime.now(tz=timezone.utc) + timedelta(seconds=10)
main_logger.debug(f"start_time -> {start_time}")
workers = 4 # os.cpu_count()
main_logger.debug(f"num_workers -> {workers}")
with concurrent.futures.ProcessPoolExecutor() as executor:
results = executor.map(
print_multi, [start_time] * workers, range(workers)
)
for r in results:
pass
main_logger.debug("Finish\n")
main()
Here's an example of a traceback, when I got only worker 2 and 3 files, instead of 0,1,2,3:
Traceback (most recent call last):
File "/usr/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/lib/python3.10/concurrent/futures/process.py", line 205, in _process_chunk
return [fn(*args) for args in chunk]
File "/usr/lib/python3.10/concurrent/futures/process.py", line 205, in <listcomp>
return [fn(*args) for args in chunk]
File "multithreading_MWE.py", line 72, in print_multi
path.unlink()
File "/usr/lib/python3.10/pathlib.py", line 1206, in unlink
self._accessor.unlink(self)
FileNotFoundError: [Errno 2] No such file or directory: 'logs/workers_print/print_worker_1_ignore.txt'
答案1
得分: 0
问题在于print_multi
函数在创建工作日志记录器和日志文件之前清理工作目录。如果另一个进程在日志记录器创建之前运行print_multi
,它将无法创建日志文件,从而引发FileNotFoundError错误。
为了避免这种情况,您可以将清理工作目录的操作移到主函数中,在启动执行程序之前执行。这样,在删除工作目录之前将创建工作日志记录器,确保所有日志文件都能成功创建。
英文:
The issue is that the print_multi
function is cleaning the worker directory before creating the worker logger and the log file. If another process runs print_multi before the logger has been created, it will not be able to create the log file and it will raise a FileNotFoundError.
To avoid this, you can move the cleaning of the worker directory to the main function before starting the executor. This way, the worker loggers will be created before the worker directory is deleted, ensuring that all log files are created successfully.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论