从可并发访问的结构中检索后,哪些数据类型是安全的用于读取和写入?

huangapple go评论78阅读模式
英文:

What data types are safe to read and write to after retrieving it from a concurrently accessible structure?

问题

从问题本身可能不清楚我在这里追求什么,所以让我澄清一下。作为并发练习,我正在尝试编写一个需要被多个同时请求访问的缓存。缓存内容的类型是interface{},因此它可以包含任何内容,包括切片、映射和结构体。当我使用Get方法获取内容时,我会在读取时对其进行RLock,并返回内容,最后使用deferred RUnlock完成。

对于数字和字符串以及任何在返回时自动复制的其他值,这个方法运行良好。但我担心切片、映射和结构体实际上并没有被复制,这样如果将返回的内容作为副本进行读取或修改,实际上会修改缓存中的数据,并且在互斥锁之外进行修改。

当然,在竞争条件下这是一个问题。因此,我不希望从Get方法返回不安全进行修改的内容,然后再传递给Set方法进行更新。所以这里有几个问题:

1)我是否正确地认为这些数据类型在这种情况下会出现问题?

2)如何解决这个问题,以创建一个可以自由操作其值而不担心在竞争条件下失败的Get方法?

英文:

Probably not clear from the question itself what I'm after here, so let me clarify. As an exercise in concurrency, I'm playing around with writing a cache that needs to be accessible by multiple simultaneous requests. Cache content is of type interface{}, so it can include anything, including slices, maps, and structs. When I grab something with a Get method, I RLock on it while reading it and then return the content and finish with a deferred RUnlock.

This works fine for numbers and strings and any other values that are automatically copied on return. But I'm concerned that slices, maps, and structs are not actually copied, such that the thing returned, if read or modified as if it were a copy, would actually be altering data in the cache and doing so outside of a mutex.

Of course, that's a problem under race conditions. So I don't want to return something from Get that is not safe to alter and then pass back to a Set method to update. So here are the questions:

  1. Am I correct in assuming that these data types present problems for a scenario like this?

  2. How might one go about resolving this issue, so as to create a Get method whose values can be freely manipulated without fear of failure under race conditions?

答案1

得分: 3

你在假设那些数据类型主要是引用和指向结构体的指针,这些类型可能会导致问题,我将在下面讨论这些问题。

我看到你正在处理两个问题。首先,你需要保护缓存免受并发访问的影响,以确保缓存始终处于正确的状态。如果你正在修改缓存并使用“写”锁,那么在某种程度上改变缓存时,缓存将保持完整。此外,只要在从缓存中读取时使用“读”锁,你就可以确保以相同的完整性从缓存中读取。因此,目前为止,保护缓存的锁只能保护缓存本身。这些锁对于保护存储在缓存中的项目没有任何作用。

这是你正在处理的第二个问题:假设你的缓存受到保护,想象一下如果两个独立的goroutine从缓存中进行正确同步的Get操作会发生什么。它们不一定要同时获取对象,但如果它们最终“获取”到某个结构体的指针或映射/切片的引用,这意味着它们可能会同时对它们都持有引用的同一对象进行修改。这就是你描述的第二个问题。

那么你有哪些选择呢?

  1. 只存储值类型,正如你所观察到的,这可能会有限制和/或代价高昂,因为需要复制所有内容。
  2. 只存储一些自定义类型,通过确保在它们被修改或读取时采取适当的锁来进行同步。
  3. 使你的缓存更智能,使其具有所有权的概念,它会愉快地从缓存中返回一个对象,并且只允许一个goroutine在完成之前“持有”它。其他goroutine必须等待该对象被释放,直到前一个goroutine使用完毕。或者,如果Get尝试获取当前不可用的项,你可以设计Get操作立即失败并返回。这个概念被广泛用于构建客户端服务器架构中的分布式锁,其中可能有许多客户端想要访问一个对象,而分布式锁确保只有一个客户端能够持有锁。

要考虑所有权的概念是很重要的。有人可能会说:只需使用通道,问题就解决了。但是,如果你将引用类型或指向结构体的指针发送到5个不同的通道中,你可能会陷入同样的困境。这5个不同的通道可能会同时对它们所持有的同一对象进行修改。糟糕了!同样的问题再次出现。这就是为什么重要的是,当你将项目传递给通道时,你放弃对其进行修改的所有权。

就像今天有人告诉我一样...并发编程很难,可能还有其他模式可以尝试,但我希望这能更深入地了解你所面临的问题。要知道的一件事是,实际上并没有绝对可靠的答案,很大程度上取决于你的应用程序的行为方式。

英文:

You are correct in assuming those kinds of data types mainly references and pointers to structs can cause a problem for the reasons I'll talk about below.

I see really two problems that are you dealing with. The first, is that you need to protect your cache from concurrent access such that the cache is always in a correct state. If you are mutating the cache and using a "write" lock, your cache will maintain its integrity when it is being changed in some way. Also, as long as you take a "read" lock when reading from the cache you are guaranteed to read out of your cache with that same integrity. So as it stands, the locks protecting your cache only work at protecting the cache itself. These locks will not do anything to protect the items stored within the cache.

This is the second issue you are dealing with: Assuming your cache is protected, think about what would happen if two separate goroutines do a properly synchronized Get operation from your cache. They don't necessarily even have to Get the object at the same time but if somehow they end up "getting" a pointer to some struct or a reference to a map/slice this means they could potentially both mutate the same object of which they are both holding references to. This manifests itself as the second problem you are describing.

So what are your options?

  1. Only store value types which as you have observed can be limiting and/or expensive because everything must be copied.
  2. Only store some custom types that are synchronized by also making sure they take appropriate locks on themselves when they are mutated or read.
  3. Make your cache smarter so that it has the concept of ownership where it will gladly return an object from the cache and only ever allow one goroutine to "hold" onto it until that goroutine is finished. Other goroutines would have to wait for that object to be released until the previous goroutine has finished with it. Either that, or you could design the Get to fail and return immediately if it tried to Get an item that was currently unavailable. This concept is widely used to build distributed locks in a client server architecture where there could be many clients that want access to an object and the distributed lock ensures that only one client can ever hold the lock.

Consider that the concept of ownership is important. Someone might say: just use channels, that will fix everything. But you can even end up in the same boat if you send a reference type or a pointer to a struct into 5 different channels. Those 5 different channels could possibly mutate the same object they are holding onto. Uh oh! the same problem manifests itself again. This is why it's important that when you pass an item to a channel you are giving up ownership to not mutate it.

As someone told me today...concurrent programming is hard and there are probably additional patterns out there you could try but I hope this gives some more insight into the problem you are dealing with. One thing to know is there isn't really a bullet-proof answer to this, much of it will depend on the nature of how your application behaves ultimately.

答案2

得分: 0

> 1) 我是否正确地认为这些数据类型对于这种情况会带来问题?

是的。Map 和 slice 具有指向内部数据结构的指针,在赋值过程中不会被复制。但是,由于您正在使用 interface{},您还可以有指向结构体的指针,该指针可以指向具有更多指针的结构体,依此类推。

> 2) 如何解决这个问题,以便创建一个可以在竞争条件下自由操作而不会失败的 Get 方法?

最简单的解决方案是只允许一个对象同时返回给一个客户端,这样就只有一个活动的、可变的版本。您还可以对对象进行序列化,这样一切都是副本。

任何需要同时进行写入和读取的内容都需要进行适当的同步。这是绝对必要的。

英文:

> 1) Am I correct in assuming that these data types present problems for
> a scenario like this?
>

Yes. Maps and slices have pointers to internal data structures that aren't copied during assignment. But since you're using an interface{} you could also have structs with pointers, which could point to structs with more pointers and so on.

> 2) How might one go about resolving this issue, so as to create a Get
> method whose values can be freely manipulated without fear of failure
> under race conditions?

The easiest solution is to only allow an object to be returned to a single client at a time, so that there's only ever one live, mutable version. You could also serialize the objects, so that everything is always a copy.

Anything that needs to be written and read concurrently needs to have proper synchronization. Period.

huangapple
  • 本文由 发表于 2014年11月1日 03:11:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/26681841.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定