使用多线程与time.sleep和唯一记录器

huangapple go评论56阅读模式
英文:

Using multithreading with time.sleep and unique loggers

问题

I'm trying to make sure that several threads start as close to each other as possible, and for that I'm using time.sleep. Each thread will have its own logger, which will output to its unique file.
有一些线程需要尽可能接近同时启动,为此我使用time.sleep。每个线程都有自己的记录器,将输出到独特的文件。

There's something very strange happening though...
然而,发生了一些非常奇怪的事情...

  • Sometimes, not all logger files are created. In the example below, instead of 4 worker files, sometimes I'll get 2, others 3. I don't see a pattern.
    有时,并没有创建所有的记录文件。在下面的示例中,有时不是4个工作文件,而是2个或3个。我看不出规律。

Here's a minimal working example:
以下是一个最小工作示例:

################# Logger ########################
main_logger = logging.getLogger("main_logger")
main_logger.setLevel(logging.DEBUG)
file_handler = logging.FileHandler(
    filename="./logs/print_multi.txt", mode="w"
)
file_handler.setLevel(logging.DEBUG)
formatter = logging.Formatter(
    "%(asctime)s - %(threadName)s - %(name)s - %(levelname)s - %(message)s"
)
file_handler.setFormatter(formatter)
main_logger.addHandler(file_handler)

def print_multi(start_time: datetime, index):
    # cleaning the worker directory
    for path in Path("./logs/workers_print/").glob("**/*"):
        if path.is_file():
            path.unlink()
    # creating logging
    worker_logger = logging.getLogger(f"print_worker_{index}")
    worker_logger.setLevel(logging.DEBUG)
    file_handler = logging.FileHandler(
        filename=f"./logs/workers_print/print_worker_{index}_ignore.txt",
        mode="w",
    )
    file_handler.setLevel(logging.DEBUG)
    formatter = logging.Formatter(
        "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
    )
    file_handler.setFormatter(formatter)
    worker_logger.addHandler(file_handler)

    # logging the times
    worker_logger.debug(f"This thread will start at {start_time}")
    time_now = datetime.now(tz=timezone.utc)
    seconds_to_start = (start_time - time_now).total_seconds()
    worker_logger.debug(f"seconds to start -> {seconds_to_start}")
    time.sleep(seconds_to_start)
    worker_logger.debug("We're in thread {index}")
    print("We're in thread {index}")

def main():
    main_logger.debug("Setting ThreadPoolExecutor")
    start_time = datetime.now(tz=timezone.utc) + timedelta(seconds=10)
    main_logger.debug(f"start_time -> {start_time}")
    workers = 4  # os.cpu_count()
    main_logger.debug(f"num_workers -> {workers}")
    with concurrent.futures.ProcessPoolExecutor() as executor:
        results = executor.map(
            print_multi, [start_time] * workers, range(workers)
        )
        for r in results:
            pass
    main_logger.debug("Finish")

main()

Here's an example of a traceback, when I got only worker 2 and 3 files, instead of 0, 1, 2, 3:
以下是一个追踪的示例,当我只得到工作2和3的文件时,而不是0、1、2、3:

Traceback (most recent call last):
  File "/usr/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/usr/lib/python3.10/concurrent/futures/process.py", line 205, in _process_chunk
    return [fn(*args) for args in chunk]
  File "/usr/lib/python3.10/concurrent/futures/process.py", line 205, in <listcomp>
    return [fn(*args) for args in chunk]
  File "multithreading_MWE.py", line 72, in print_multi
    path.unlink()
  File "/usr/lib/python3.10/pathlib.py", line 1206, in unlink
    self._accessor.unlink(self)
FileNotFoundError: [Errno 2] No such file or directory: 'logs/workers_print/print_worker_1_ignore.txt'
英文:

I'm trying to make sure that several threads start as close to each other as possible, and for that I'm using time.sleep. Each thread will have its own logger, which will output to its unique file.
There's something very strange happening though...

  • Sometimes, not all logger files are created. In the example below, instead of 4 worker files, sometimes I'll get 2, others 3. I don't see a pattern.

Here's a minimal working example:

#################        Logger           ########################
main_logger = logging.getLogger(&quot;main_logger&quot;)
main_logger.setLevel(logging.DEBUG)
file_handler = logging.FileHandler(
filename=&quot;./logs/print_multi.txt&quot;, mode=&quot;w&quot;
)
file_handler.setLevel(logging.DEBUG)
formatter = logging.Formatter(
&quot;%(asctime)s - %(threadName)s - %(name)s - %(levelname)s - %(message)s&quot;
)
file_handler.setFormatter(formatter)
main_logger.addHandler(file_handler)
def print_multi(start_time: datetime, index):
# cleaning the worker directory
for path in Path(&quot;./logs/workers_print/&quot;).glob(&quot;**/*&quot;):
if path.is_file():
path.unlink()
# creating logging
worker_logger = logging.getLogger(f&quot;print_worker_{index}&quot;)
worker_logger.setLevel(logging.DEBUG)
file_handler = logging.FileHandler(
filename=f&quot;./logs/workers_print/print_worker_{index}_ignore.txt&quot;,
mode=&quot;w&quot;,
)
file_handler.setLevel(logging.DEBUG)
formatter = logging.Formatter(
&quot;%(asctime)s - %(name)s - %(levelname)s - %(message)s&quot;
)
file_handler.setFormatter(formatter)
worker_logger.addHandler(file_handler)
# logging the times
worker_logger.debug(f&quot;This thread will start at {start_time}&quot;)
time_now = datetime.now(tz=timezone.utc)
seconds_to_start = (start_time - time_now).total_seconds()
worker_logger.debug(f&quot;seconds to start -&gt; {seconds_to_start}&quot;)
time.sleep(seconds_to_start)
worker_logger.debug(f&quot;We&#39;re in thread {index}&quot;)
print(f&quot;We&#39;re in thread {index}&quot;)
def main():
main_logger.debug(&quot;Setting ThreadPoolExecuter&quot;)
start_time = datetime.now(tz=timezone.utc) + timedelta(seconds=10)
main_logger.debug(f&quot;start_time -&gt; {start_time}&quot;)
workers = 4  # os.cpu_count()
main_logger.debug(f&quot;num_workers -&gt; {workers}&quot;)
with concurrent.futures.ProcessPoolExecutor() as executor:
results = executor.map(
print_multi, [start_time] * workers, range(workers)
)
for r in results:
pass
main_logger.debug(&quot;Finish\n&quot;)
main()

Here's an example of a traceback, when I got only worker 2 and 3 files, instead of 0,1,2,3:

Traceback (most recent call last):
File &quot;/usr/lib/python3.10/concurrent/futures/process.py&quot;, line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File &quot;/usr/lib/python3.10/concurrent/futures/process.py&quot;, line 205, in _process_chunk
return [fn(*args) for args in chunk]
File &quot;/usr/lib/python3.10/concurrent/futures/process.py&quot;, line 205, in &lt;listcomp&gt;
return [fn(*args) for args in chunk]
File &quot;multithreading_MWE.py&quot;, line 72, in print_multi
path.unlink()
File &quot;/usr/lib/python3.10/pathlib.py&quot;, line 1206, in unlink
self._accessor.unlink(self)
FileNotFoundError: [Errno 2] No such file or directory: &#39;logs/workers_print/print_worker_1_ignore.txt&#39;

答案1

得分: 0

问题在于print_multi函数在创建工作日志记录器和日志文件之前清理工作目录。如果另一个进程在日志记录器创建之前运行print_multi,它将无法创建日志文件,从而引发FileNotFoundError错误。

为了避免这种情况,您可以将清理工作目录的操作移到主函数中,在启动执行程序之前执行。这样,在删除工作目录之前将创建工作日志记录器,确保所有日志文件都能成功创建。

英文:

The issue is that the print_multi function is cleaning the worker directory before creating the worker logger and the log file. If another process runs print_multi before the logger has been created, it will not be able to create the log file and it will raise a FileNotFoundError.

To avoid this, you can move the cleaning of the worker directory to the main function before starting the executor. This way, the worker loggers will be created before the worker directory is deleted, ensuring that all log files are created successfully.

huangapple
  • 本文由 发表于 2023年2月9日 00:20:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/75388731.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定