执行大型事务时减少内存消耗

huangapple go评论69阅读模式
英文:

Reducing memory consumption when executing large transactions

问题

我根据互联网上的一个示例,增加了INSERT查询的速度,使用以下参数来配置SQLite:

PRAGMA journal_mode = OFF;
PRAGMA synchronous = 0;
PRAGMA cache_size = 1000000;
PRAGMA locking_mode = EXCLUSIVE;
PRAGMA temp_store = MEMORY;

以下是我的代码:

tx, err := db.Begin()
if err != nil {
    log.Fatal(err)
}
pr, err := tx.Prepare("INSERT INTO Table (p1, p2, p3, p4, p5) VALUES (?, ?, ?, ?, ?)")
if err != nil {
    log.Fatal(err)
}
defer pr.Close()
for i := 0; i < maxI; i++ {
    for j := 0; j < maxJ; j++ {
        ...
        _, err = pr.Exec(param1, param2, param3, param4, param5)
        if err != nil {
            log.Fatal(err)
        }
    }
}
err = tx.Commit()
if err != nil {
    log.Fatal(err)
}

现在查询运行速度很快,但是消耗了太多的内存。因为数据存储在内存中,只在执行的最后保存到数据库文件中。

我认为可以定期将数据保存到数据库文件中,这样会稍微增加执行时间,但减少内存消耗。每次更改"i"时,事务开始,在所有"j"完成后,事务结束:

for i := 0; i < maxI; i++ {
    tx, err := db.Begin()
    if err != nil {
        log.Fatal(err)
    }
    pr, err := tx.Prepare("INSERT INTO Table (p1, p2, p3, p4, p5) VALUES (?, ?, ?, ?, ?)")
    if err != nil {
        log.Fatal(err)
    }
    defer pr.Close()
    for j := 0; j < maxJ; j++ {
        ...
        _, err = pr.Exec(param1, param2, param3, param4, param5)
        if err != nil {
            log.Fatal(err)
        }
    }
    err = tx.Commit()
    if err != nil {
        log.Fatal(err)
    }
}

我认为现在数据应该以块的形式写入文件,只有一个数据块应该在内存中。

但在执行过程中,数据没有保存到文件中,内存继续填充。也就是说,第一种和第二种代码选项之间的执行没有区别。

我认为当调用"Commit"提交事务时,数据应该保存到文件中并清空内存。请告诉我我做错了什么。

英文:

I increased the speed of INSERT queries following an example from the Internet, using the following parameters for SQLite:

PRAGMA journal_mode = OFF;
PRAGMA synchronous = 0;
PRAGMA cache_size = 1000000;
PRAGMA locking_mode = EXCLUSIVE;
PRAGMA temp_store = MEMORY;

Here is my code:

tx, err := db.Begin()
if err != nil {
    log.Fatal(err)
}
pr, err := tx.Prepare(&quot;INSERT INTO Table (p1, p2, p3, p4, p5) VALUES (?, ?, ?, ?, ?)&quot;)
if err != nil {
    log.Fatal(err)
}
defer pr.Close()
for i := 0; i &lt; maxI; i++ {
    for j := 0; j &lt; maxJ; j++ {
        ...
        _, err = pr.Exec(param1, param2, param3, param4, param5)
        if err != nil {
            log.Fatal(err)
        }
    }
}
err = tx.Commit()
if err != nil {
    log.Fatal(err)
}

Now the query is running fast, but too much RAM is being consumed. Because the data is stored in RAM and saved to the database file only at the very end of the execution.

I thought that it is possible to periodically save the data to the database file, this will slightly increase the execution time, but reduce the memory consumption. With each change of "i", the transaction starts and when all "j" are completed, the transaction ends:

for i := 0; i &lt; maxI; i++ {
    tx, err := db.Begin()
    if err != nil {
        log.Fatal(err)
    }
    pr, err := tx.Prepare(&quot;INSERT INTO Table (p1, p2, p3, p4, p5) VALUES (?, ?, ?, ?, ?)&quot;)
    if err != nil {
        log.Fatal(err)
    }
    defer pr.Close()
    for j := 0; j &lt; maxJ; j++ {
        ...
        _, err = pr.Exec(param1, param2, param3, param4, param5)
        if err != nil {
            log.Fatal(err)
        }
    }
    err = tx.Commit()
    if err != nil {
        log.Fatal(err)
    }
}

I thought that now data should be written to the file in blocks and only one data block should be in RAM.

But in the process of execution, the data is not saved to the file and the RAM continues to fill up. That is, there is no difference in execution between the first and second code options.

I thought that when call "Commit" for a transaction, the data should be saved to a file and the RAM should be cleared. Please tell me what am I doing wrong.

答案1

得分: 3

PRAGMA cache_size 的参数是页面的数量(通常每个页面为4k字节)。

PRAGMA cache_size = 1000000; 将最多分配4GB的内存给页面缓存。
页面缓存在需要时分配,最多分配到指定的最大值,但在连接关闭之前不会释放。

由于您正在插入大量的行,它们将分布在不同的页面上,因此直到填满缓存为止,您将在缓存中保留所有写入磁盘的页面。

如果您想减少内存消耗,只需将值减小到类似于1000的值,相当于4MB,或者完全删除它。默认缓存为2MB,如果只是插入行,则足够使用。

还要注意,当调用 COMMIT 时(甚至在提交之前,如果缓存不足),数据确实会写入磁盘。但是,为了避免从磁盘重新读取数据,Sqlite会在缓存中保留一份副本,以备以后需要。

英文:

The parameter of PRAGMA cache_size is the number of pages (normally 4k bytes per page).

PRAGMA cache_size = 1000000; will allocate a maximum of 4GB of RAM to the page cache.
Page cache is allocated when needed, up to that maximum, but is not freed until the connection is closed.

Since you are inserting a large number of rows, they will end up on different pages, so you will end up having in cache all the pages you have written to disk, until you fill the cache.

If you want to reduce memory consumption, just reduce the value to something like 1000, which corresponds to 4 MB, or remove it at all. Default cache is 2 MB and is enough if you are just inserting rows.

Also note that data is indeed written to disk when you call COMMIT (or even before commit if there is not enough cache). But Sqlite will keep a copy in cache in the event that it's needed afterwards, to avoid having to reread it from the disk.

huangapple
  • 本文由 发表于 2023年4月19日 21:44:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76055283.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定