使用gob进行序列化到磁盘后的检索

huangapple go评论77阅读模式
英文:

Retrieval after serialization to disk using gob

问题

我一直在学习数据库,并且想要实现一个用于学习目的而不是生产的数据库。我有一个定义好的模式:

type Row struct {
	ID       int32
	Username string
	Email    string
}

现在,我能够以追加的方式将这种类型的结构体编码到文件中。

// 为了展示我使用文件进行编码,这里省略了一些细节。

func NewEncoder(db *DB) *gob.Encoder{
	return gob.NewEncoder(db.File)
}

func SerializeRow(r Row, encoder *gob.Encoder, db *DB) {
	err := encoder.Encode(r)
	if err != nil {
		log.Println("encode error:", err)
	}
}

现在,通过使用gob.decode解码整个文件,模拟一个"select"语句是相对容易的。

func DeserializeRow(decoder *gob.Decoder, db *DB){
	var rows Row
	db.File.Seek(0, 0)
	err := decoder.Decode(&rows)
	for err == nil {
		if err != nil {
			log.Println("decode error:", err)
		}
		fmt.Printf("%d %s %s\n", rows.ID, rows.Username, rows.Email)
		err = decoder.Decode(&rows)
	}
}

我目前的问题是,我想要根据ID检索特定的行。我知道sqlite使用4KB的分页,也就是说序列化的行占用一个"页",即4KB,直到一个页无法容纳它们,然后创建另一个页。如何以最简单和惯用的方式使用gob模拟这种行为呢?

参考:我看到了这个这个

英文:

I have been learning about databases and wanted to implement one as well for learning purposes and not for production. I have a defined schema:

type Row struct {
	ID       int32
	Username string
	Email    string
}

Now, currently, i am able to encode structs of this type to a file in an append only manner.

//Just to show i use a file for the encoding, it has missing details.

func NewEncoder(db *DB) *gob.Encoder{
	return gob.NewEncoder(db.File)
}

func SerializeRow(r Row, encoder *gob.Encoder, db *DB) {
	err := encoder.Encode(r)
	if err != nil {
		log.Println("encode error:", err)
	}
}

Now, it's relatively easy to mimick a "select" statement by simply decoding the entire file using gob.decode

func DeserializeRow(decoder *gob.Decoder, db *DB){
	var rows Row
	db.File.Seek(0, 0)
	err := decoder.Decode(&rows)
	for err == nil {
		if err != nil {
			log.Println("decode error:", err)
		}
		fmt.Printf("%d %s %s\n", rows.ID, rows.Username, rows.Email)
		err = decoder.Decode(&rows)
	}
}

My current issue is, I want to be able to retrieve specific rows based on ID. I know sqlite uses 4kb paging, in the sense that serialized rows occupy a "page" ie. 4KB till a page can't hold them anymore, then another is created. How do I mimick such a behaviour using gob in the most simplistic and idiomatic way?

Seen: I have seen this and this

答案1

得分: 2

一个 Gob 流可能包含类型定义和解码指令,所以你不能在 Gob 流中进行定位。你只能从开头开始读取,直到找到你需要的内容。

对于需要跳过元素的数据库存储格式来说,Gob 流完全不适用。

你可以创建一个新的编码器,将每个记录单独序列化,这样你就可以跳过元素(通过维护一个文件索引,存储每个记录的起始位置),但这样做效率非常低下且冗余(正如链接的答案中所描述的,随着你写入更多相同类型的值,速度和存储成本会分摊,而始终创建新的编码器会失去这种优势)。

一个更好的方法是不使用 encoding/gob,而是定义自己的格式。为了有效地支持搜索(select),你必须在可搜索的列/字段上构建某种索引,否则仍然需要执行全表扫描。

英文:

A Gob stream may contain type definitions and decoding instructions, so you can't seek a Gob stream. You can only read it from the beginning up to the point you find what you need.

A Gob stream is completely unsuitable for a database storage format in which you need to skip elements.

You could create a new encoder and serialize each records separately, in which case you could skip elements (by maintaining a file index storing which record starts at which position), but it would be terribly inefficient and redundant (as described in the linked answer, the speed and storage cost amortizes as you write more values of the same type, and always creating new encoders loses this gain).

A much better approach would be to not use encoding/gob for this, but rather define your own format. To efficiently support searches (select), you have to build some kind of index on the searchable columns / fields, else you still need to perform a full table scan.

huangapple
  • 本文由 发表于 2022年4月18日 17:35:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/71910062.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定