英文:
Mysterious and Excessive memory allocation in a function in a go program
问题
我有以下代码,它使用了大量的内存,远远超出了预期。我使用了pprof
工具,它显示函数NewEdge
分配了程序分配的内存的94%以上。
我的问题是,这段代码有什么问题,导致它使用了这么多内存:
type Vertex struct {
Id string `json:"id"` // must be unique
Properties map[string]string `json:"properties"` // to be implemented soon
verticesThisIsConnectedTo map[string][]string `json:"-"` //id for the edges *Edge // keys are Vertex ids, each pair of vertices can be connected to each other with multiple edges
verticesConnectedToThis map[string][]string `json:"_"` //id for the edges *Edge // keys are Vertex ids,
}
type Edge struct {
id string `json:"-"` // for internal use, unique
Label string `json:"label"`
SourceId string `json:"source-id"`
TargetId string `json:"terget-id"`
Type string `json:"type"`
Properties map[string]string `json:"properties"` // to be implemented soon
}
func (v *Vertex) isPartof(g *Graph) bool {
_, b := g.Vertices[v.Id]
return b
}
func (g *Graph) NewEdge(source, target *Vertex, label, edgeType string) (Edge, error) {
if source.Id == target.Id {
return Edge{}, ERROR_NO_EDGE_TO_SELF_ALLOWED
}
if !source.isPartof(g) || !target.isPartof(g) {
return Edge{}, errors.New("InvalidEdge, source or target not in this graph")
}
e := Edge{id: <-nextId, Label: label, SourceId: source.Id, TargetId: target.Id, Type: edgeType}
g.Edges[e.id] = &e
source.verticesThisIsConnectedTo[target.Id] = append(source.verticesThisIsConnectedTo[target.Id], e.id)
target.verticesConnectedToThis[source.Id] = append(target.verticesConnectedToThis[source.Id], e.id)
return e, nil
}
func fakeGraph(g Graph, nodesCount, followratio int) error {
var err error
// create the vertices
for i := 0; i < nodesCount; i++ {
v := NewVertex("")
g.AddVertex(v)
}
// create some "follow edges"
followcount := followratio * nodesCount / 100
vkeys := []string{}
for pk := range g.Vertices {
vkeys = append(vkeys, pk)
}
for ki := range g.Vertices {
pidx := rand.Perm(nodesCount)
followcounter := followcount
for j := 0; j < followcounter; j++ {
_, err := g.NewEdge(g.Vertices[ki], g.Vertices[vkeys[pidx[j]]], <-nextId, EDGE_TYPE_FOLLOW)
if err != nil {
followcounter++ // to compensate for references to self
}
}
}
return err
}
分配发生在这样一个调用中:fakeGraph(Aragog, 2000, 1)
,其中:
func fakeGraph(g Graph, nodesCount, followratio int) error {
// ...
}
问题/疑问:
我可以创建成千上万个Vertex
,内存使用量非常合理。但是调用NewEdge
非常消耗内存。我首先注意到代码使用了大量的内存。我使用pprof
和-memprofile
运行了go tool pprof
,得到了以下结果:
(pprof) top10
Total: 9.9 MB
8.9 89.9% 89.9% 8.9 89.9% main.(*Graph).NewEdge
0.5 5.0% 95.0% 0.5 5.0% allocg
0.5 5.0% 100.0% 0.5 5.0% fmt.Sprintf
0.0 0.0% 100.0% 0.5 5.0% _rt0_go
0.0 0.0% 100.0% 8.9 89.9% main.fakeGraph
0.0 0.0% 100.0% 0.5 5.0% main.func·003
0.0 0.0% 100.0% 8.9 89.9% main.main
0.0 0.0% 100.0% 0.5 5.0% mcommoninit
(pprof)
非常感谢任何帮助。
英文:
I have the following code, which uses tones of memory, which is way higher than expected.
I used to pprof
tool and it shows that the function NewEdge
is allocating more than 94% of all the memory allocated by the program.
My question is, what is wrong with this code, that is uses so much memory:
type Vertex struct {
Id string `json:"id"` // must be unique
Properties map[string]string `json:"properties"` // to be implemented soon
verticesThisIsConnectedTo map[string][]string `json:"-"` //id for the edges *Edge // keys are Vertex ids, each pair of vertices can be connected to each other with multiple edges
verticesConnectedToThis map[string][]string `json:"_"` //id for the edges *Edge // keys are Vertex ids,
}
type Edge struct {
id string `json:"-"` // for internal use, unique
Label string `json:"label"`
SourceId string `json:"source-id"`
TargetId string `json:"terget-id"`
Type string `json:"type"`
Properties map[string]string `json:"properties"` // to be implemented soon
}
func (v *Vertex) isPartof(g *Graph) bool {
_, b := g.Vertices[v.Id]
return b
}
func (g *Graph) NewEdge(source, target *Vertex, label, edgeType string) (Edge, error) {
if source.Id == target.Id {
return Edge{}, ERROR_NO_EDGE_TO_SELF_ALLOWED
}
if !source.isPartof(g) || !target.isPartof(g) {
return Edge{}, errors.New("InvalidEdge, source or target not in this graph")
}
e := Edge{id: <-nextId, Label: label, SourceId: source.Id, TargetId: target.Id, Type: edgeType}
g.Edges[e.id] = &e
source.verticesThisIsConnectedTo[target.Id] = append(source.verticesThisIsConnectedTo[target.Id], e.id)
target.verticesConnectedToThis[source.Id] = append(target.verticesConnectedToThis[source.Id], e.id)
return e, nil
}
The allocation happens by a call like this: fakeGraph(Aragog, 2000, 1)
where :
func fakeGraph(g Graph, nodesCount, followratio int) error {
var err error
// create the vertices
for i := 0; i < nodesCount; i++ {
v := NewVertex("") //FH.RandStr(10))
g.AddVertex(v)
}
// create some "follow edges"
followcount := followratio * nodesCount / 100
vkeys := []string{}
for pk := range g.Vertices {
vkeys = append(vkeys, pk)
}
for ki := range g.Vertices {
pidx := rand.Perm(nodesCount)
followcounter := followcount
for j := 0; j < followcounter; j++ {
_, err := g.NewEdge(g.Vertices[ki], g.Vertices[vkeys[pidx[j]]], <-nextId, EDGE_TYPE_FOLLOW)
if err != nil {
followcounter++ // to compensate for references to self
}
}
}
return err
}
Question / mystery :
I can create thousands of Vertex
s and the memory usage is very reasonable. But calls to NewEdge
are very memory intensive. I first noticed that the code was using tones of memory. I ran pprof
with -memprofile
and then used go tool pprof
and got this:
(pprof) top10
Total: 9.9 MB
8.9 89.9% 89.9% 8.9 89.9% main.(*Graph).NewEdge
0.5 5.0% 95.0% 0.5 5.0% allocg
0.5 5.0% 100.0% 0.5 5.0% fmt.Sprintf
0.0 0.0% 100.0% 0.5 5.0% _rt0_go
0.0 0.0% 100.0% 8.9 89.9% main.fakeGraph
0.0 0.0% 100.0% 0.5 5.0% main.func·003
0.0 0.0% 100.0% 8.9 89.9% main.main
0.0 0.0% 100.0% 0.5 5.0% mcommoninit
(pprof)
Any help is very much appreciated.
答案1
得分: 1
@ali 我认为这个内存分析中没有什么神秘的地方。
首先,如果你检查结构体的大小,你会发现 Edge 结构体比 Vertex 结构体大两倍。(你可以通过 unsafe.Sizeof() 来检查结构体的大小)
所以,如果你调用 fakeGraph(Aragog, 2000, 1),Go 会分配:
- 2000 个 Vertex 结构体
- 至少 2000 * 20 = 40,000 个 Edge 结构体
可以看到,NewEdge() 分配的内存至少是 fakeGraph() 的 40 倍。
此外,每次你尝试创建新的边,都会分配一个新的 Edge 结构体,即使 NewEdge() 返回错误。
另一个因素是,你返回的是结构体本身,而不是结构体的指针。在 Go 中,结构体是值类型,所以一旦你从 NewEdge() 返回,整个结构体将被复制,这也可能导致新的内存分配。
是的,我知道你从不使用返回的结构体,但我不确定 Go 编译器是否会检查调用者的上下文并跳过 Edge 的复制。
英文:
@ali I think there is no mystery in this memory profiling.
First of all, If you check size of your structs you will see what Edge struct is 2 times bigger than Vertex struct. (you can check size of structs by unsafe.Sizeof())
So, if you will call fakeGraph(Aragog, 2000, 1) Go will allocate:
- 2000 Vertex structs
- at least 2000 * 20 = 40 000 Edge structs
As you can see NewEdge() will allocate at least 40 times more memory then fakeGraph()
Also, every time you will try to create new edge, new Edge struct will allocated - even if NewEdge() return error.
Another factor is - you return struct itself, not pointer to struct. In Go struct is value types, so entire struct will be copied once you will return from NewEdge() and it also can cause new allocation.
Yes, I see what you never use returned struct, but I'm not sure if Go compiler will check caller's context and skip Edge copying
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论