从字符串池中清除未使用的记录的最佳方法是什么?

huangapple go评论83阅读模式
英文:

What is the best way to evict unused records from string pool?

问题

我正在使用Golang实现一个缓存。假设缓存可以使用sync.Map实现,其中键为整数,值为一个结构体:

type value struct {
	fileName     string
	functionName string
}

大量的记录具有相同的fileNamefunctionName。为了节省内存,我想使用字符串池。Go语言具有不可变字符串的特性,我的想法是:

var (
	cache      sync.Map
	stringPool sync.Map
)

type value struct {
	fileName     string
	functionName string
}

func addRecord(key int64, val value) {
	fileName, _ := stringPool.LoadOrStore(val.fileName, val.fileName)
	val.fileName = fileName.(string)
	functionName, _ := stringPool.LoadOrStore(val.functionName, val.functionName)
	val.functionName = functionName.(string)
	cache.Store(key, val)
}

我的想法是将每个唯一的字符串(fileNamefunctionName)保存在内存中一次。这样做可以吗?

缓存实现必须是并发安全的。缓存中的记录数量约为10^8。字符串池中的记录数量约为10^6。

我有一些逻辑来从缓存中删除记录。主缓存大小没有问题。

请问你能否建议如何管理字符串池的大小?

我正在考虑为字符串池中的每个记录存储引用计数。这将需要额外的同步或可能需要全局锁来维护它。我希望实现尽可能简单。你可以在我的代码片段中看到,我没有使用额外的互斥锁。

或者也许我需要采用完全不同的方法来最小化我的缓存的内存使用?

英文:

I am implementing a cache in Golang. Let's say the cache could be implemented as sync.Map with integer key and value as a struct:

type value struct {
	fileName     string
	functionName string
}

Huge number of records have the same fileName and functionName. To save memory I want to use string pool. Go has immutable strings and my idea looks like:

var (
	cache      sync.Map
	stringPool sync.Map
)

type value struct {
	fileName     string
	functionName string
}

func addRecord(key int64, val value) {
	fileName, _ := stringPool.LoadOrStore(val.fileName, val.fileName)
	val.fileName = fileName.(string)
	functionName, _ := stringPool.LoadOrStore(val.functionName, val.functionName)
	val.functionName = functionName.(string)
	cache.Store(key, val)
}

My idea is to keep every unique string (fileName and functionName) in memory once. Will it work?

Cache implementation must be concurrent safe. The number of records in the cache is about 10^8. The number of records in the string pool is about 10^6.

I have some logic that removes records from the cache. There is no problem with main cache size.

Could you please suggest how to manage string pool size?

I am thinking about storing reference count for every record in the string pool. It will require additional synchronizations or probably global locks to maintain it. I would like to implementation as simple as possible. You can see in my code snippet I don't use additional mutexes.

Or may be I need to follow completely different approach to minimize memory usage for my cache?

答案1

得分: 2

你正在尝试使用stringPool进行字符串池化,这通常被称为“字符串驻留”。有一些库(例如github.com/josharian/intern)提供了“足够好”的解决方案,不需要手动维护stringPool映射。请注意,没有任何解决方案(包括你的解决方案,假设你最终从stringPool中删除一些元素)可以在不产生不切实际的CPU开销的情况下可靠地去重100%的字符串。

另外值得一提的是,sync.Map并不是为高更新负载而设计的(参考:https://pkg.go.dev/sync#Map)。根据使用的key,在调用cache.Store时可能会遇到显著的争用。此外,由于sync.Map对于键和值都依赖于interface{},它通常比普通的map产生更多的分配。请确保使用真实的工作负载进行基准测试,以确保选择了正确的方法。

英文:

What you are trying to do with stringPool is commonly known as string interning. There are libraries like github.com/josharian/intern that provide "good enough" solutions to that kind of problem, and that do not require you to manually maintain the stringPool map. Note that no solution (including yours, assuming you eventually remove some elements from stringPool) can reliably deduplicate 100% of strings without incurring impractical levels of CPU overhead.

As a side note, it's worth pointing out that sync.Map is not really designed for update-heavy workloads. Depending on the keys used, you may actually experience significant contention when calling cache.Store. Furthermore, since sync.Map relies on interface{} for both keys and values, it normally incurs much more allocations that a plain map. Make sure to benchmark with realistic workloads to ensure that you picked the right approach.

huangapple
  • 本文由 发表于 2021年7月19日 12:05:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/68434993.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定