英文:
Retrieval after serialization to disk using gob
问题
我一直在学习数据库,并且想要实现一个用于学习目的而不是生产的数据库。我有一个定义好的模式:
type Row struct {
ID int32
Username string
Email string
}
现在,我能够以追加的方式将这种类型的结构体编码到文件中。
// 为了展示我使用文件进行编码,这里省略了一些细节。
func NewEncoder(db *DB) *gob.Encoder{
return gob.NewEncoder(db.File)
}
func SerializeRow(r Row, encoder *gob.Encoder, db *DB) {
err := encoder.Encode(r)
if err != nil {
log.Println("encode error:", err)
}
}
现在,通过使用gob.decode
解码整个文件,模拟一个"select"语句是相对容易的。
func DeserializeRow(decoder *gob.Decoder, db *DB){
var rows Row
db.File.Seek(0, 0)
err := decoder.Decode(&rows)
for err == nil {
if err != nil {
log.Println("decode error:", err)
}
fmt.Printf("%d %s %s\n", rows.ID, rows.Username, rows.Email)
err = decoder.Decode(&rows)
}
}
我目前的问题是,我想要根据ID检索特定的行。我知道sqlite
使用4KB的分页,也就是说序列化的行占用一个"页",即4KB,直到一个页无法容纳它们,然后创建另一个页。如何以最简单和惯用的方式使用gob
模拟这种行为呢?
英文:
I have been learning about databases and wanted to implement one as well for learning purposes and not for production. I have a defined schema:
type Row struct {
ID int32
Username string
Email string
}
Now, currently, i am able to encode structs of this type to a file in an append only manner.
//Just to show i use a file for the encoding, it has missing details.
func NewEncoder(db *DB) *gob.Encoder{
return gob.NewEncoder(db.File)
}
func SerializeRow(r Row, encoder *gob.Encoder, db *DB) {
err := encoder.Encode(r)
if err != nil {
log.Println("encode error:", err)
}
}
Now, it's relatively easy to mimick a "select" statement by simply decoding the entire file using gob.decode
func DeserializeRow(decoder *gob.Decoder, db *DB){
var rows Row
db.File.Seek(0, 0)
err := decoder.Decode(&rows)
for err == nil {
if err != nil {
log.Println("decode error:", err)
}
fmt.Printf("%d %s %s\n", rows.ID, rows.Username, rows.Email)
err = decoder.Decode(&rows)
}
}
My current issue is, I want to be able to retrieve specific rows based on ID. I know sqlite
uses 4kb paging, in the sense that serialized rows occupy a "page" ie. 4KB till a page can't hold them anymore, then another is created. How do I mimick such a behaviour using gob
in the most simplistic and idiomatic way?
Seen: I have seen this and this
答案1
得分: 2
一个 Gob 流可能包含类型定义和解码指令,所以你不能在 Gob 流中进行定位。你只能从开头开始读取,直到找到你需要的内容。
对于需要跳过元素的数据库存储格式来说,Gob 流完全不适用。
你可以创建一个新的编码器,将每个记录单独序列化,这样你就可以跳过元素(通过维护一个文件索引,存储每个记录的起始位置),但这样做效率非常低下且冗余(正如链接的答案中所描述的,随着你写入更多相同类型的值,速度和存储成本会分摊,而始终创建新的编码器会失去这种优势)。
一个更好的方法是不使用 encoding/gob
,而是定义自己的格式。为了有效地支持搜索(select
),你必须在可搜索的列/字段上构建某种索引,否则仍然需要执行全表扫描。
英文:
A Gob stream may contain type definitions and decoding instructions, so you can't seek a Gob stream. You can only read it from the beginning up to the point you find what you need.
A Gob stream is completely unsuitable for a database storage format in which you need to skip elements.
You could create a new encoder and serialize each records separately, in which case you could skip elements (by maintaining a file index storing which record starts at which position), but it would be terribly inefficient and redundant (as described in the linked answer, the speed and storage cost amortizes as you write more values of the same type, and always creating new encoders loses this gain).
A much better approach would be to not use encoding/gob
for this, but rather define your own format. To efficiently support searches (select
), you have to build some kind of index on the searchable columns / fields, else you still need to perform a full table scan.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论