在MySQL的golang驱动中,无缓冲的结果集

huangapple go评论92阅读模式
英文:

Unbuffered result set in MySQL golang driver

问题

我有一个很大的查询,并且我想使用Go MySQL driver逐行处理结果。查询非常简单,但返回了大量的行。

我在C-API级别上看到了mysql_use_result()mysqli_store_result()。是否有一种类似的方法可以在TCP连接上执行无缓冲查询,就像Go MySQL driver所使用的那样?

英文:

I have a large query and I want to process the result row-by-row using the Go MySQL driver. The query is very simple but it returns a huge number of rows.

I have seen mysql_use_result() vs mysqli_store_result() at the C-API level. Is there an equivalent way to do an unbuffered query over a TCP connection, such as is used by the Go MySQL driver?

答案1

得分: 6

数据库客户端库中缓冲/非缓冲查询的概念有点误导人,因为实际上,缓冲可能发生在多个级别上。一般来说(不是特定于Go,也不是特定于MySQL),有不同类型的缓冲区。

  • **TCP套接字缓冲区。**内核为每个套接字关联一个通信缓冲区。默认情况下,该缓冲区的大小是动态的,并由内核参数控制。一些客户端可以更改这些默认值以获得更多控制和优化。该缓冲区的目的是调节设备队列中的流量,并最终减少网络上的数据包数量。

  • **通信缓冲区。**面向数据库的协议通常基于帧协议,这意味着帧被定义为在TCP流中分隔逻辑数据包。套接字缓冲区不能保证完整的逻辑数据包(帧)可供读取。因此,需要额外的通信缓冲区来确保在处理时帧是完整的。它们还可以帮助减少系统调用的数量。这些缓冲区由数据库客户端库的低级通信机制管理。

  • **行缓冲区。**一些数据库客户端选择将从服务器读取的所有行保存在内存中,并让应用程序代码浏览相应的数据结构。例如,PostgreSQL C客户端(libpq)就是这样做的。MySQL C客户端则由开发人员自行选择(通过调用mysql_use_result或mysql_store_result)。

无论如何,你提到的Go驱动程序不是基于MySQL C客户端(它是一个纯Go驱动程序)。它只使用前两种缓冲区(套接字和通信缓冲区)。不提供行级缓冲。

每个MySQL连接有一个通信缓冲区,其大小是4 KB的倍数。如果帧很大,它将动态增长。在MySQL协议中,每一行都作为单独的数据包(在帧中)发送,因此通信缓冲区的大小与客户端接收/发送的最大行直接相关。

结果是,你可以运行返回大量行的查询而不会使内存饱和,并且仍然具有良好的套接字性能。对于开发人员来说,使用此驱动程序,无论查询如何,缓冲都不是问题。

英文:

This concept of buffered/unbuffered queries in database client libraries is a bit misleading, because actually, buffering may occur on multiple levels. In general (i.e. not Go specific, and not MySQL specific), you have different kinds of buffers.

  • TCP socket buffers. The kernel associates a communication buffer to each socket. By default, the size of this buffer is dynamic and controlled by kernel parameters. Some clients can change those defaults to get more control and optimize. Purpose if this buffer is to regulate the traffic in the device queues and eventually, decrease the number of packets on the network.

  • Communication buffers. Database oriented protocols are generally based on a framing protocol, meaning that frames are defined to separate the logical packets in the TCP stream. Socket buffers do not guarantee that a complete logical packet (a frame) is available for reading. Extra communication buffers are therefore required to make sure the frames are complete when they are processed. They can also help to reduce the number of system calls. These buffers are managed by the low-level communication mechanism of the database client library.

  • Rows buffers. Some database clients choose to keep all the rows read from the server in memory, and let the application code browse the corresponding data structures. For instance, the PostgreSQL C client (libpq) does it. The MySQL C client leaves the choice to the developer (by calling mysql_use_result or mysql_store_result).

Anyway, the Go driver you mention is not based on the MySQL C client (it is a pure Go driver). It uses only the two first kinds of buffers (sockets, and communication buffers). Row level buffering is not provided.

There is one communication buffer per MySQL connection. Its size is a multiple of 4 KB. It will grow dynamically if the frames are large. In the MySQL protocol, each row is sent as a separate packet (in a frame), so the size of the communication buffer is directly linked to the largest rows received/sent by the client.

The consequence is you can run a query returning a huge number of rows without saturating the memory, and still having good socket performance. With this driver, buffering is never a problem for the developer, whatever the query.

huangapple
  • 本文由 发表于 2015年3月13日 10:27:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/29023902.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定