英文:
Compressing RData to Store in a Database
问题
我想尽可能地压缩R数据以存储在SQLite数据库中。到目前为止,我能够缩小数据的最佳方式是将其保存为RData文件,然后再加载它。
我有两个问题:
- 是否有一种方法可以使数据更小?
- 是否有一种方法可以在不先保存到文件的情况下创建相同的对象?
# 创建一个大数据对象
carsList <- list(rep(cars,100))
# 尝试使用serialize和collapse转换为字符串以存储在SQLite数据库中
object.size(paste(serialize(carsList, connection = NULL), collapse = ""))
168456 字节
# 先保存,然后加载相同的数据,将其转换为字符串以存储在SQLite数据库中
saveTempFile <- tempfile()
save(carsList, file = saveTempFile)
object.size(paste(readBin(saveTempFile, what = "raw", n=1e7), collapse = ""))
1936 字节
英文:
I would like to compress R data as much as possible to store it in a SQLite database. The most that I've been able to shrink the data so far is to save it as a RData file, then to load it.
I have two questions:
- Is there a way to make the data smaller?
- Is there a way to create the same object without saving it to a file first?
# Make a big data object
carsList <- list(rep(cars,100))
# Trying serialize and collapse to string to store in SQLite database
object.size(paste(serialize(carsList, connection = NULL), collapse = ""))
168456 bytes
# Save and then load same data, collapse to a string to store in a SQLite database
saveTempFile <- tempfile()
save(carsList, file = saveTempFile)
object.size(paste(readBin(saveTempFile, what = "raw", n=1e7), collapse = ""))
1936 bytes
答案1
得分: 2
通常情况下,save
压缩效果很好,但你可以指定 compression_level = 9
来让它更努力地减小文件大小。
在磁盘上,我看到 save(carsList, file = "default.rdata")
的大小是 913 字节,而 save(carsList, file = "compress.rdata", compression_level = 9)
的大小是 721 字节。根据 ?save
中的指引,查看 ?file
的压缩部分并采纳他们的建议:
对于写模式的连接,
compress
指定了压缩器在最小化文件大小方面的工作程度,较高的值需要更多的CPU时间和更多的工作内存(最多可达约800MB,对于xzfile(compress = 9)
)。对于xzfile
,compress 的负值对应于添加xz
参数-e
:这需要更长的时间(大约是双倍?)来压缩,但可能会实现(略微)更好的压缩。默认值为 6 具有良好的压缩效果和适度的内存使用(约100MB):但如果你正在使用xz
压缩,你可能是在寻求高压缩比。
save(carsList, file = "xz_max.rdata", compress = "xz", compression_level = -1)
生成一个 368 字节的文件。
英文:
Generally the save
compression is pretty good, but you can specify compression_level = 9
to have it work harder at making a smaller file.
On disk, I see save(carsList, file = "default.rdata")
as 913 Bytes and save(carsList, file = "compress.rdata", compression_level = 9)
as 721 Bytes. Following the ?save
pointer to the compression section of ?file
and taking their advice,
> For write-mode connections, compress
specifies how hard the compressor works to minimize the file size, and higher values need more CPU time and more working memory (up to ca 800Mb for xzfile(compress = 9)
). For xzfile
negative values of compress correspond to adding the xz
argument -e
: this takes more time (double?) to compress but may achieve (slightly) better compression. The default (6) has good compression and modest (100Mb memory) usage: but if you are using xz
compression you are probably looking for high compression.
save(carsList, file = "xz_max.rdata", compress = "xz", compression_level = -1)
produces a 368 byte file.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论