将RData压缩以存储在数据库中

huangapple go评论59阅读模式
英文:

Compressing RData to Store in a Database

问题

我想尽可能地压缩R数据以存储在SQLite数据库中。到目前为止,我能够缩小数据的最佳方式是将其保存为RData文件,然后再加载它。

我有两个问题:

  1. 是否有一种方法可以使数据更小?
  2. 是否有一种方法可以在不先保存到文件的情况下创建相同的对象?
# 创建一个大数据对象
carsList <- list(rep(cars,100))

# 尝试使用serialize和collapse转换为字符串以存储在SQLite数据库中
object.size(paste(serialize(carsList, connection = NULL), collapse = ""))
168456 字节

# 先保存,然后加载相同的数据,将其转换为字符串以存储在SQLite数据库中
saveTempFile <- tempfile()
save(carsList, file = saveTempFile)
object.size(paste(readBin(saveTempFile, what = "raw", n=1e7), collapse = ""))
1936 字节
英文:

I would like to compress R data as much as possible to store it in a SQLite database. The most that I've been able to shrink the data so far is to save it as a RData file, then to load it.

I have two questions:

  1. Is there a way to make the data smaller?
  2. Is there a way to create the same object without saving it to a file first?
# Make a big data object
carsList &lt;- list(rep(cars,100))

# Trying serialize and collapse to string to store in SQLite database
object.size(paste(serialize(carsList, connection = NULL), collapse = &quot;&quot;))
168456 bytes

# Save and then load same data, collapse to a string to store in a SQLite database
saveTempFile &lt;- tempfile()
save(carsList, file = saveTempFile)
object.size(paste(readBin(saveTempFile, what = &quot;raw&quot;, n=1e7), collapse = &quot;&quot;))
1936 bytes

答案1

得分: 2

通常情况下,save 压缩效果很好,但你可以指定 compression_level = 9 来让它更努力地减小文件大小。
在磁盘上,我看到 save(carsList, file = "default.rdata") 的大小是 913 字节,而 save(carsList, file = "compress.rdata", compression_level = 9) 的大小是 721 字节。根据 ?save 中的指引,查看 ?file 的压缩部分并采纳他们的建议:

对于写模式的连接,compress 指定了压缩器在最小化文件大小方面的工作程度,较高的值需要更多的CPU时间和更多的工作内存(最多可达约800MB,对于 xzfile(compress = 9))。对于 xzfile,compress 的负值对应于添加 xz 参数 -e:这需要更长的时间(大约是双倍?)来压缩,但可能会实现(略微)更好的压缩。默认值为 6 具有良好的压缩效果和适度的内存使用(约100MB):但如果你正在使用 xz 压缩,你可能是在寻求高压缩比。

save(carsList, file = "xz_max.rdata", compress = "xz", compression_level = -1) 生成一个 368 字节的文件。

英文:

Generally the save compression is pretty good, but you can specify compression_level = 9 to have it work harder at making a smaller file.
On disk, I see save(carsList, file = &quot;default.rdata&quot;) as 913 Bytes and save(carsList, file = &quot;compress.rdata&quot;, compression_level = 9) as 721 Bytes. Following the ?save pointer to the compression section of ?file and taking their advice,

> For write-mode connections, compress specifies how hard the compressor works to minimize the file size, and higher values need more CPU time and more working memory (up to ca 800Mb for xzfile(compress = 9)). For xzfile negative values of compress correspond to adding the xz argument -e: this takes more time (double?) to compress but may achieve (slightly) better compression. The default (6) has good compression and modest (100Mb memory) usage: but if you are using xz compression you are probably looking for high compression.

save(carsList, file = &quot;xz_max.rdata&quot;, compress = &quot;xz&quot;, compression_level = -1) produces a 368 byte file.

huangapple
  • 本文由 发表于 2023年8月4日 02:37:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76830813.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定