英文:
Improving the performance of rows.Scan() in Go
问题
我有一个非常简单的查询,返回了几千行数据,只有两列:
SELECT "id", "value" FROM "table" LIMIT 10000;
在使用sql.Query()
执行查询之后,我使用以下代码遍历结果集:
data := map[uint8]string{}
for rows.Next() {
var (
id uint8
value string
)
if error := rows.Scan(&id, &value); error == nil {
data[id] = value
}
}
如果我直接在数据库上运行完全相同的查询,几毫秒内就能得到所有结果,但是Go代码需要更长的时间,有时甚至接近10秒!
我开始注释掉代码的几个部分,似乎rows.Scan()
是罪魁祸首。
Scan将当前行的列复制到dest指向的值中。
如果参数的类型是*[]byte,Scan将在该参数中保存相应数据的副本。副本由调用者拥有,可以进行修改并持有无限期。可以通过使用类型为RawBytes的参数来避免复制;有关其使用限制的详细信息,请参阅RawBytes的文档。如果参数的类型是interface{},Scan将在不进行转换的情况下复制底层驱动程序提供的值。如果值的类型是[]byte,将进行复制,并且调用者拥有结果。
如果我使用*[]byte
、*RawBytes
或*interface{}
,是否可以期望获得速度上的改进?
查看代码,convertAssign()
函数似乎做了很多对于这个特定查询来说不必要的工作。所以我的问题是:如何使Scan
过程更快?
我考虑过重载函数以期望预定的类型,但在Go中这是不可能的...
有什么想法吗?
英文:
I have a very simple query that returns a couple thousand rows with only two columns:
SELECT "id", "value" FROM "table" LIMIT 10000;
After issuing sql.Query()
, I traverse the result set with the following code:
data := map[uint8]string{}
for rows.Next() {
var (
id uint8
value string
)
if error := rows.Scan(&id, &value); error == nil {
data[id] = value
}
}
If I run the exact same query directly on the database, I get all results back within a couple of milliseconds, but the Go code takes far longer complete, sometimes almost 10 seconds!
I started commenting out several parts of the code and it seems that rows.Scan()
is the culprit.
> Scan copies the columns in the current row into the values pointed at
> by dest.
>
> If an argument has type *[]byte, Scan saves in that argument a copy of
> the corresponding data. The copy is owned by the caller and can be
> modified and held indefinitely. The copy can be avoided by using an
> argument of type *RawBytes instead; see the documentation for RawBytes
> for restrictions on its use. If an argument has type *interface{},
> Scan copies the value provided by the underlying driver without
> conversion. If the value is of type []byte, a copy is made and the
> caller owns the result.
Can any expect any speed improvement if I use *[]byte
, *RawBytes
or *interface{}
instead?
Looking at the code, it looks like the convertAssign()
function is doing a lot of stuff that isn't necessary for this particular query. So my question is: how can I make the Scan
process faster?
I thought about overloading the function to expect predetermined types, but that isn't possible in Go...
Any ideas?
答案1
得分: 4
是的,你可以使用RawBytes
,并且rows.Scan()
将避免内存分配/复制。
关于convertAssign()
函数-是的,在Go 1.2中它不是最优的,但是在1.3中有了显著的改进:
- http://code.google.com/p/go/issues/detail?id=7086
- sync.Pool的无锁实现
我有一些使用RawBytes
的示例-https://gist.github.com/yvasiyarov/9911956
这段代码从MySQL表中读取数据,进行一些处理,并将其写入CSV文件。昨晚它花了1分24秒来生成4GB的CSV数据(约3000万行)
所以我非常确定问题不在于Go代码:即使是最糟糕的rows.Scan()
使用方式也不会导致10秒的延迟。
英文:
yes, you can use RawBytes
instead and rows.Scan()
will avoid memory allocation/copying
About convertAssign()
function - yes, its not optimal in Go 1.2,
but they make significant improvements in 1.3:
- http://code.google.com/p/go/issues/detail?id=7086
- Lock-less implementation for sync.Pool
I have some example of RawBytes
usage - https://gist.github.com/yvasiyarov/9911956
This code read data from MySQL table, make some processing and write it to CSV files.
Last night it takes 1 minute 24 seconds to generate 4GB of CSV data(about 30 million rows)
so I'm pretty sure what problem is outside of go code: even worse possible usage of rows.Scan()
can not give you 10 seconds delay.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论