2023年6月12日 22:02:57go评论122阅读模式

英文:

Why is this Swift Readers-Writers code causing deadlock?

问题

我似乎在Swift中使用带有屏障的并发DispatchQueue为读写问题（Readers-Writers problem）有一个经典解决方案。然而，当我运行我的测试代码时，它会发生死锁。

以下是我想要线程安全的数据结构：

class Container<T> {
    private var _value: T
    private let _queue = DispatchQueue(label: "containerQueue", attributes: .concurrent)
    
    init(_ value: T) {
        self._value = value
    }
    
    var value: T {
        get {
            _queue.sync {
                _value
            }
        }
        set {
            _queue.async(flags: .barrier) {
                self._value = newValue
            }
        }
    }
}

这是我的测试代码：

class ContainerTest {
    let testQueue = DispatchQueue(label: "testQueue", attributes: .concurrent)
    var container = Container(0)
    let group = DispatchGroup()
    
    func runTest() {
        for i in 0..<1000 {
            testQueue.async(group: group) {
                self.container.value = max(i, self.container.value)
            }
        }
        
        group.notify(queue: .main) {
            print("Finished")
        }
    }
}

重复运行的代码片段只是一些随机读写操作。它不试图生成任何有意义的东西，只是为了对数据结构进行压力测试。

所以当我运行这个代码时，“Finished”从不打印出来。但是，如果我将 _queue.async(flags: .barrier) 更改为 _queue.sync(flags: .barrier)，那么我会看到“Finished”打印出来。

我猜想当我使用 async 写入版本时，我遇到了死锁，但是为什么呢？这是通常使用的经典读者-写者解决方案。也许是我的测试代码有问题，但再次问一次，为什么？

英文:

I seem to have a classic solution for Readers-Writers problem in Swift using concurrent DispatchQueue with a barrier for writes. However, then I run my test code, it deadlocks.

Here's my wanna-be-thread-safe data structure:

class Container&lt;T&gt; {
    private var _value: T
    private let _queue = DispatchQueue(label: &quot;containerQueue&quot;, attributes: .concurrent)
    
    init(_ value: T) {
        self._value = value
    }
    
    var value: T {
        get {
            _queue.sync {
                _value
            }
        }
        set {
            _queue.async(flags: .barrier) {
                self._value = newValue
            }
        }
    }
}

And here's my test code:

class ContainerTest {
    let testQueue = DispatchQueue(label: &quot;testQueue&quot;, attributes: .concurrent)
    var container = Container(0)
    let group = DispatchGroup()
    
    func runTest() {
        for i in 0..&lt;1000 {
            testQueue.async(group: group) {
                self.container.value = max(i, self.container.value)
            }
        }
        
        group.notify(queue: .main) {
            print(&quot;Finished&quot;)
        }
    }
}

The piece of code that's run repeatedly is just some random read and write operations. It's not attempting to produce anything sensible, it's just there to stress-test the data structure.

So when I run this, "Finished" is never printed. However, if I change _queue.async(flags: .barrier) to _queue.sync(flags: .barrier), then I see "Finished" printed.

I'm guessing when I'm using the async write version, I'm getting a deadlock, but why? It's the textbook Readers-Writers solution that's typically used. Perhaps it's my test code that is at fault, but again, why?

答案1

得分: 4

Rob在他的回答中给了我一些关于为什么我的代码无法完成的重要线索："线程池耗尽，没有可用的线程，一切都被锁定"。因此，我想确切地了解为什么在池耗尽时一切都会被锁定。

经过了解一些关于线程池的知识，我通常会认为池耗尽可能对性能有害，但不一定是死锁的原因，因为一旦一些块完成执行，它们会释放其线程，并允许其他块执行。因此，我没有立刻看到这如何导致死锁。

但不幸的是，这可能导致死锁的一种方式如下：

在上面的图像中，我试图描述死锁发生时我的队列的状态。

主队列是"fine"，因为它将所有内容异步分派到"testQueue"。

现在，我刚刚了解了GCD线程池——它是一组可用的线程，GCD可以在需要时添加更多线程，但有一个限制，这可能取决于平台和设备，但为了示例，假设最多4个线程。

因此，在这种情况下，我分派了1000个并发块，前4个立即获得了自己的线程来执行，导致GCD线程池耗尽。

其余的996个块必须等待线程释放才能运行。到目前为止都还好。

接下来，获得线程的第一个块开始执行。在它的代码中，它调用_queue.sync来读取"containerQueue"上的值。要在"containerQueue"上运行块，它需要自己的线程，但由于我们的GCD线程池为空，它无法运行并必须等待线程变得可用。

现在，由于"containerQueue"中的块被同步分派，它会阻止"testQueue"块完成。现在，由于它无法完成，它无法释放其线程回到GCD线程池。

因此，"testQueue"块永远不会完成，永远不会释放它们的线程，而"containerQueue"块永远不会启动，因为它们永远等待线程释放。

这就是死锁。

现在，如果我暂停我的程序执行并查看我的线程，我会看到一堆"testQueue"线程，都在等待一个锁，在尝试访问Container.value.getter之后，这似乎支持了我上面的假设。

然而，这并不能解释为什么当我将setter从异步改为同步时，死锁会消失。也许这只是"幸运"。

英文:

Rob in his answer gave me some great clues into why my code was not completing: "thread pool exhaustion, not threads available, everything locks up". So I wanted to see exactly why does everything lock up when pool exhaustion happens.

Having learned a bit about thread pools, I would normally expect pool exhaustion to be perhaps bad for performance, but not necessarily a reason for a deadlock, because once some blocks finish executing, they would free up their threads, and allow other blocks to execute. So I didn't see immediately how can this result in a deadlock.

But alas, here's one way this could have resulted in a deadlock:

In the image above I'm trying to depict what the state of my queues was at the time when deadlock happened.

The main queue is "fine", because it dispatched everything asynchronously to the "testQueue".

Now, I just learned about the GCD thread pool – it's a pool of available threads, and GCD can add more threads to it when needed, but there's a limit, which probably depends on the platform and device, but for the sake of example let's say it's max 4 threads.

So in this case, I dispatched 1000 concurrent blocks, and the first 4 got their own thread to execute on, immediately depleting the GCD thread pool.

The remaining 996 blocks have to wait for threads to free up before they can run. This is all fine so far.

Next, the first block that received a thread starts executing. In its code it calls _queue.sync to read the value on the "containerQueue". To run a block on "containerQueue" it needs its own thread, but because our GCD thread pool is empty, it can't run and has to wait for threads to become available.

Now because the block in "containerQueue" was dispatched synchronously, it blocks the "testQueue" block from completing. Now since it can't complete, it can't release its thread back to the GCD thread pool.

So "testQueue" blocks never complete, never release their threads, and "containerQueue" blocks never start because they wait forever for threads to free up.

This is a deadlock.

Now if I pause my program execution and look at my threads, I see a whole bunch of "testQueue" threads, all waiting on a lock, after trying to access Container.value.getter, which seems to support my hypothesis above.

However, this doesn't explain why the deadlock goes away when I change the setter from async to sync. Maybe it's just "luck".

答案2

得分: 3

你几乎肯定会遇到线程池耗尽的问题。这是苹果工程师反复警告不要使用的模式，因为存在这种风险（我知道它也在许多地方被记录为常见模式）。每次调用.async都会启动一个线程，如果存在严重争用（如您所创建的情况），最终将没有可用线程，所有事情都会被锁定。

在现代的Swift中，正确的工具是actor。在并发之前的Swift中，您可能想要使用OSAllocatedUnfairLock，但当您发现自己创建了一个原子属性，以为它是线程安全时，通常是错误的道路（详情请参阅这里）。

英文:

You're almost certainly getting thread-pool exhaustion. This is a pattern Apple engineers repeatedly warn against using because of that risk (I'm aware it's also documented as a common pattern in many places). Each call to .async is spawning a thread, and if there is severe contention (as you're creating), eventually there will be no threads available, and everything will lock up.

In modern Swift, the correct tool for this is an actor. In pre-concurrency Swift, you'd likely want to use an OSAllocatedUnfairLock, though when you find yourself making an atomic property, thinking you mean thread-safe, you are often on the wrong road.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么这段Swift读者-写者代码会导致死锁？

问题

答案1

答案2

为什么这段代码的结果是非确定性的？

fmt.Println在没有使用sleep的情况下不按顺序执行。

Java – 使用共同的执行器服务实例进行并发处理

Threadpool in C

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。