提高Go语言中rows.Scan()的性能

huangapple go评论109阅读模式
英文:

Improving the performance of rows.Scan() in Go

问题

我有一个非常简单的查询,返回了几千行数据,只有两列:

SELECT "id", "value" FROM "table" LIMIT 10000;

在使用sql.Query()执行查询之后,我使用以下代码遍历结果集:

data := map[uint8]string{}

for rows.Next() {
    var (
        id    uint8
        value string
    )

    if error := rows.Scan(&id, &value); error == nil {
        data[id] = value
    }
}

如果我直接在数据库上运行完全相同的查询,几毫秒内就能得到所有结果,但是Go代码需要更长的时间,有时甚至接近10秒!

我开始注释掉代码的几个部分,似乎rows.Scan()是罪魁祸首。

Scan将当前行的列复制到dest指向的值中。

如果参数的类型是*[]byte,Scan将在该参数中保存相应数据的副本。副本由调用者拥有,可以进行修改并持有无限期。可以通过使用类型为RawBytes的参数来避免复制;有关其使用限制的详细信息,请参阅RawBytes的文档。如果参数的类型是interface{},Scan将在不进行转换的情况下复制底层驱动程序提供的值。如果值的类型是[]byte,将进行复制,并且调用者拥有结果。

如果我使用*[]byte*RawBytes*interface{},是否可以期望获得速度上的改进?

查看代码convertAssign()函数似乎做了很多对于这个特定查询来说不必要的工作。所以我的问题是:如何使Scan过程更快?

我考虑过重载函数以期望预定的类型,但在Go中这是不可能的...

有什么想法吗?

英文:

I have a very simple query that returns a couple thousand rows with only two columns:

SELECT "id", "value" FROM "table" LIMIT 10000;

After issuing sql.Query(), I traverse the result set with the following code:

data := map[uint8]string{}

for rows.Next() {
	var (
		id     uint8
		value  string
	)

	if error := rows.Scan(&id, &value); error == nil {
		data[id] = value
	}
}

If I run the exact same query directly on the database, I get all results back within a couple of milliseconds, but the Go code takes far longer complete, sometimes almost 10 seconds!

I started commenting out several parts of the code and it seems that rows.Scan() is the culprit.

> Scan copies the columns in the current row into the values pointed at
> by dest.
>
> If an argument has type *[]byte, Scan saves in that argument a copy of
> the corresponding data. The copy is owned by the caller and can be
> modified and held indefinitely. The copy can be avoided by using an
> argument of type *RawBytes instead; see the documentation for RawBytes
> for restrictions on its use. If an argument has type *interface{},
> Scan copies the value provided by the underlying driver without
> conversion. If the value is of type []byte, a copy is made and the
> caller owns the result.

Can any expect any speed improvement if I use *[]byte, *RawBytes or *interface{} instead?

Looking at the code, it looks like the convertAssign() function is doing a lot of stuff that isn't necessary for this particular query. So my question is: how can I make the Scan process faster?

I thought about overloading the function to expect predetermined types, but that isn't possible in Go...

Any ideas?

答案1

得分: 4

是的,你可以使用RawBytes,并且rows.Scan()将避免内存分配/复制。

关于convertAssign()函数-是的,在Go 1.2中它不是最优的,但是在1.3中有了显著的改进:

我有一些使用RawBytes的示例-https://gist.github.com/yvasiyarov/9911956

这段代码从MySQL表中读取数据,进行一些处理,并将其写入CSV文件。昨晚它花了1分24秒来生成4GB的CSV数据(约3000万行)

所以我非常确定问题不在于Go代码:即使是最糟糕的rows.Scan()使用方式也不会导致10秒的延迟。

英文:

yes, you can use RawBytes instead and rows.Scan() will avoid memory allocation/copying

About convertAssign() function - yes, its not optimal in Go 1.2,
but they make significant improvements in 1.3:

I have some example of RawBytes usage - https://gist.github.com/yvasiyarov/9911956

This code read data from MySQL table, make some processing and write it to CSV files.
Last night it takes 1 minute 24 seconds to generate 4GB of CSV data(about 30 million rows)

so I'm pretty sure what problem is outside of go code: even worse possible usage of rows.Scan() can not give you 10 seconds delay.

huangapple
  • 本文由 发表于 2014年4月1日 06:26:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/22773413.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定