英文:
What is the best way to evict unused records from string pool?
问题
我正在使用Golang实现一个缓存。假设缓存可以使用sync.Map实现,其中键为整数,值为一个结构体:
type value struct {
fileName string
functionName string
}
大量的记录具有相同的fileName
和functionName
。为了节省内存,我想使用字符串池。Go语言具有不可变字符串的特性,我的想法是:
var (
cache sync.Map
stringPool sync.Map
)
type value struct {
fileName string
functionName string
}
func addRecord(key int64, val value) {
fileName, _ := stringPool.LoadOrStore(val.fileName, val.fileName)
val.fileName = fileName.(string)
functionName, _ := stringPool.LoadOrStore(val.functionName, val.functionName)
val.functionName = functionName.(string)
cache.Store(key, val)
}
我的想法是将每个唯一的字符串(fileName
和functionName
)保存在内存中一次。这样做可以吗?
缓存实现必须是并发安全的。缓存中的记录数量约为10^8。字符串池中的记录数量约为10^6。
我有一些逻辑来从缓存中删除记录。主缓存大小没有问题。
请问你能否建议如何管理字符串池的大小?
我正在考虑为字符串池中的每个记录存储引用计数。这将需要额外的同步或可能需要全局锁来维护它。我希望实现尽可能简单。你可以在我的代码片段中看到,我没有使用额外的互斥锁。
或者也许我需要采用完全不同的方法来最小化我的缓存的内存使用?
英文:
I am implementing a cache in Golang. Let's say the cache could be implemented as sync.Map with integer key and value as a struct:
type value struct {
fileName string
functionName string
}
Huge number of records have the same fileName
and functionName
. To save memory I want to use string pool. Go has immutable strings and my idea looks like:
var (
cache sync.Map
stringPool sync.Map
)
type value struct {
fileName string
functionName string
}
func addRecord(key int64, val value) {
fileName, _ := stringPool.LoadOrStore(val.fileName, val.fileName)
val.fileName = fileName.(string)
functionName, _ := stringPool.LoadOrStore(val.functionName, val.functionName)
val.functionName = functionName.(string)
cache.Store(key, val)
}
My idea is to keep every unique string (fileName
and functionName
) in memory once. Will it work?
Cache implementation must be concurrent safe. The number of records in the cache is about 10^8. The number of records in the string pool is about 10^6.
I have some logic that removes records from the cache. There is no problem with main cache size.
Could you please suggest how to manage string pool size?
I am thinking about storing reference count for every record in the string pool. It will require additional synchronizations or probably global locks to maintain it. I would like to implementation as simple as possible. You can see in my code snippet I don't use additional mutexes.
Or may be I need to follow completely different approach to minimize memory usage for my cache?
答案1
得分: 2
你正在尝试使用stringPool
进行字符串池化,这通常被称为“字符串驻留”。有一些库(例如github.com/josharian/intern)提供了“足够好”的解决方案,不需要手动维护stringPool
映射。请注意,没有任何解决方案(包括你的解决方案,假设你最终从stringPool
中删除一些元素)可以在不产生不切实际的CPU开销的情况下可靠地去重100%的字符串。
另外值得一提的是,sync.Map
并不是为高更新负载而设计的(参考:https://pkg.go.dev/sync#Map)。根据使用的key
,在调用cache.Store
时可能会遇到显著的争用。此外,由于sync.Map
对于键和值都依赖于interface{}
,它通常比普通的map
产生更多的分配。请确保使用真实的工作负载进行基准测试,以确保选择了正确的方法。
英文:
What you are trying to do with stringPool
is commonly known as string interning. There are libraries like github.com/josharian/intern that provide "good enough" solutions to that kind of problem, and that do not require you to manually maintain the stringPool
map. Note that no solution (including yours, assuming you eventually remove some elements from stringPool
) can reliably deduplicate 100% of strings without incurring impractical levels of CPU overhead.
As a side note, it's worth pointing out that sync.Map
is not really designed for update-heavy workloads. Depending on the key
s used, you may actually experience significant contention when calling cache.Store
. Furthermore, since sync.Map
relies on interface{}
for both keys and values, it normally incurs much more allocations that a plain map
. Make sure to benchmark with realistic workloads to ensure that you picked the right approach.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论