在Ruby中,在Thread.new内部使用循环块有什么用途?

huangapple go评论55阅读模式
英文:

What's the use of having a loop block inside Thread.new in ruby?

问题

The part you want to be translated is:

我是一个初学者,当涉及到多线程时。我正在尝试理解 Ruby 中的 Thread 如何工作,并从这个网站的示例中看到这个示例(线程安全数据结构

require 'thread'

queue = Queue.new

producer = Thread.new do
  10.times do
    queue.push(Time.now.to_i)
    sleep 1
  end
end

consumers = []

3.times do
  consumers << Thread.new do
    loop do
      unix_timestamp = queue.pop
      formatted_timestamp = unix_timestamp.to_s.reverse.
                            gsub(/(\d\d\d)/, ',').reverse

      puts "It's been #{formatted_timestamp} seconds since the epoch!"
    end
  end
end

producer.join

The part you want to be translated into Chinese is:

我不理解的部分是最后的 Thread.new do 内部的 loop do 的使用

consumers << Thread.new do
  loop do
    ...
  end
end

当我运行上面的代码时,我得到

ruby thread_safe.rb
It's been 1,680,277,028 seconds since the epoch!
It's been 1,680,277,029 seconds since the epoch!
It's been 1,680,277,030 seconds since the epoch!
It's been 1,680,277,031 seconds since the epoch!
It's been 1,680,277,032 seconds since the epoch!
It's been 1,680,277,033 seconds since the epoch!
It's been 1,680,277,034 seconds since the epoch!
It's been 1,680,277,035 seconds since the epoch!

如果我移除 loop,我只看到它被打印 3 次:

 ruby thread_safe.rb
It's been 1,680,277,037 seconds since the epoch!
It's been 1,680,277,038 seconds since the epoch!
It's been 1,680,277,039 seconds since the epoch!

我理解 Ruby 的 Queue 实现了一个线程安全的数据结构(类似于在数组周围包装 mutex.synchronize),并且队列会等待直到它有东西才会弹出它。

我的最初期望是看到文本 "It's been X seconds since the epoch!" 打印 3 次,因为我们正在生成 3 个线程,并在每个线程中弹出队列一次。但不知何故,添加 loop 会弹出和打印它 10 次。

我有两个问题:

  • 针对这个示例代码的问题:为什么将 loop 包装在代码块内会导致队列总共弹出 10 次?
  • 一个更通用的问题:在处理多线程环境时,是否习惯于在 Thread.new 内部使用 loop?(我注意到在一些地方看到了 loop 代码块在 Thread.new 内部的使用,如 Ruby 多线程(TutorialsPoint)、Ruby 中的线程循环 (Stackoverflow) 以及我在问题中引用的代码 线程安全数据结构,我开始注意到一种模式。如果在多线程代码中使用 loop 是一种惯例,那为什么会这样?)
英文:

I am a beginner when it comes to multi-threading. I am trying to understand how Thread in ruby works, and saw this example from this website (Thread-safe Data Structures)

require 'thread'

queue = Queue.new

producer = Thread.new do
  10.times do
    queue.push(Time.now.to_i)
    sleep 1
  end
end

consumers = []

3.times do
  consumers << Thread.new do
    loop do
      unix_timestamp = queue.pop
      formatted_timestamp = unix_timestamp.to_s.reverse.
                            gsub(/(\d\d\d)/, ',').reverse

      puts "It's been #{formatted_timestamp} seconds since the epoch!"
    end
  end
end

producer.join

The part that I don't understand is the use of loop do inside the last Thread.new do

consumers << Thread.new do
  loop do
    ...
  end
end

When I run the code above, I get

ruby thread_safe.rb
It's been 1,680,277,028 seconds since the epoch!
It's been 1,680,277,029 seconds since the epoch!
It's been 1,680,277,030 seconds since the epoch!
It's been 1,680,277,031 seconds since the epoch!
It's been 1,680,277,032 seconds since the epoch!
It's been 1,680,277,033 seconds since the epoch!
It's been 1,680,277,034 seconds since the epoch!
It's been 1,680,277,035 seconds since the epoch!

If I remove the loop, I only see it printed 3x:

❯ ruby thread_safe.rb
It's been 1,680,277,037 seconds since the epoch!
It's been 1,680,277,038 seconds since the epoch!
It's been 1,680,277,039 seconds since the epoch!

I understand that Ruby's Queue implements a thread-safe data structure (akin to wrapping a mutex.synchronize around an array), and that queue will wait until it has something before it pops it.

My initial expectation is to see the text "It's been X seconds since the epoch!" printed 3x, because we are spawning 3 threads and in each thread we pop the queue once. But somehow adding the loop pops and prints it 10x.

I have two questions:

  • A question specific to this code example: why is it that wrapping the loop block caused the queue to be popped a total of 10 times?
  • A more generic question: when dealing with multi-threaded environment, is it the convention to use loop inside Thread.new? (I've seen the loop block used inside Thread.new in a few places like Ruby Multithreading(TutorialsPoint), Thread loop in Ruby (Stackoverflow), and the code that I referenced in the question, Thread-safe Data Structures, that I begin to notice a pattern. If using loop is the convention in multi-thread code, why is that?)

答案1

得分: 3

> why is it that wrapping the loop block caused the queue to be popped a total of 10 times?

为什么将循环块包装起来会导致队列被弹出总共10次?

Each of the consumer threads executes an infinite loop, reading messages from the queue as they arrive.

每个消费者线程都执行一个无限循环,随着消息的到达从队列中读取消息。

They collectively read 10 messages because 10 messages were placed in the queue. They would collectively read 20 messages if 20 messages were placed in the queue.

它们共同读取了10条消息,因为队列中放置了10条消息。如果放入了20条消息,它们会一起读取20条消息。

How many messages each consumer thread receives is unknown. It depends on how fast the thread can process them compared to the rate at which they produced. And if multiple threads are waiting for work (as constantly happens here), then one of the threads "at random" will receive the message.

每个消费者线程接收多少消息是未知的。它取决于线程能够以多快的速度处理这些消息,与它们被生成的速度相比。如果有多个线程在等待工作(就像这里经常发生的情况一样),那么其中一个线程会“随机”接收到消息。

> A more generic question: when dealing with multi-threaded environment, is it the convention to use loop inside Thread.new?

一个更通用的问题:在处理多线程环境时,是不是惯例在Thread.new内部使用循环?

This is called the worker model. It allows a single thread to be used to process multiple jobs. This avoids the overhead of creating a new thread for each job. It also naturally caps the number of threads.

这被称为工作模型。它允许使用单个线程来处理多个作业。这避免了为每个作业创建新线程的开销。它还自然地限制了线程的数量。

The code in the OP is missing one detail however. It needs to exit if no jobs are available and no jobs will ever become available again.

然而,OP中的代码缺少一个细节。如果没有可用的作业,而且不会再有作业可用,它需要退出。

Right now, the program exits the moment the producer is done. This is a bug, as it can leave some jobs unprocessed or incompletely processed. To fix this, you should also join the consumers. But for that to happen, the consumers can't use an infinite loop. This is easy to solve by adding queue.close outside the loop in the producer, and handling that accordingly in the consumers.

目前,一旦生产者完成,程序就会退出。这是一个错误,因为它可能会导致一些作业未被处理或未完全处理。要解决这个问题,您还应该加入消费者。但要实现这一点,消费者不能使用无限循环。这很容易通过在生产者的循环之外添加queue.close来解决,并在消费者中相应地处理。

So three changes are needed:

所以需要三个更改:

  • Add queue.close to the producer.

  • Handle a closed queue in the consumers.

  • Wait for the consumers to complete by joining them.

  • 在生产者中添加queue.close

  • 在消费者中处理关闭的队列。

  • 通过加入消费者等待消费者完成。

I don't know Ruby, so the implementation is left to you.

我不了解Ruby,所以具体实现留给您。

英文:

> why is it that wrapping the loop block caused the queue to be popped a total of 10 times?

Each of the consumer threads executes an infinite loop, reading messages from the queue as they arrive.

They collectively read 10 messages because 10 messages were placed in the queue. They would collectively read 20 messages if 20 messages were placed in the queue.

How many messages each consumer thread receives is unknown. It depends on how fast the thread can process them compared to the rate at which they produced. And if multiple threads are waiting for work (as constantly happens here), then one of the threads "at random" will receive the message.


> A more generic question: when dealing with multi-threaded environment, is it the convention to use loop inside Thread.new?

This is called the worker model. It allows a single thread to be used to process multiple jobs. This avoids the overheard of creating a new thread for each job. It also naturally caps the number of threads.

The code in the OP is missing one detail however. It needs to exit if no jobs are available and no jobs will ever become available again.

Right now, the program exits the moment the producer is done. This is a bug, as it can leave some jobs unprocessed or incompletely processed. To fix this, you should also join the consumers. But for that to happen, the consumers can't use an infinite loop. This is easy to solve by adding queue.close outside the loop in the producer, and handling that accordingly in the consumers.

So three changes are needed:

  • Add queue.close to the producer.
  • Handle a closed queue in the consumers.
  • Wait for the consumers to complete by joining them.

I don't know ruby, so the implementation is left to you.

huangapple
  • 本文由 发表于 2023年4月1日 00:01:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/75900572.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定