2022年4月18日 17:35:49go评论156阅读模式

英文:

Retrieval after serialization to disk using gob

问题

我一直在学习数据库，并且想要实现一个用于学习目的而不是生产的数据库。我有一个定义好的模式：

type Row struct {
	ID       int32
	Username string
	Email    string
}

现在，我能够以追加的方式将这种类型的结构体编码到文件中。

// 为了展示我使用文件进行编码，这里省略了一些细节。

func NewEncoder(db *DB) *gob.Encoder{
	return gob.NewEncoder(db.File)
}

func SerializeRow(r Row, encoder *gob.Encoder, db *DB) {
	err := encoder.Encode(r)
	if err != nil {
		log.Println("encode error:", err)
	}
}

现在，通过使用gob.decode解码整个文件，模拟一个"select"语句是相对容易的。

func DeserializeRow(decoder *gob.Decoder, db *DB){
	var rows Row
	db.File.Seek(0, 0)
	err := decoder.Decode(&rows)
	for err == nil {
		if err != nil {
			log.Println("decode error:", err)
		}
		fmt.Printf("%d %s %s\n", rows.ID, rows.Username, rows.Email)
		err = decoder.Decode(&rows)
	}
}

我目前的问题是，我想要根据ID检索特定的行。我知道sqlite使用4KB的分页，也就是说序列化的行占用一个"页"，即4KB，直到一个页无法容纳它们，然后创建另一个页。如何以最简单和惯用的方式使用gob模拟这种行为呢？

参考：我看到了这个和这个

英文:

I have been learning about databases and wanted to implement one as well for learning purposes and not for production. I have a defined schema:

type Row struct {
	ID       int32
	Username string
	Email    string
}

Now, currently, i am able to encode structs of this type to a file in an append only manner.

//Just to show i use a file for the encoding, it has missing details.

func NewEncoder(db *DB) *gob.Encoder{
	return gob.NewEncoder(db.File)
}

func SerializeRow(r Row, encoder *gob.Encoder, db *DB) {
	err := encoder.Encode(r)
	if err != nil {
		log.Println(&quot;encode error:&quot;, err)
	}
}

Now, it's relatively easy to mimick a "select" statement by simply decoding the entire file using gob.decode

func DeserializeRow(decoder *gob.Decoder, db *DB){
	var rows Row
	db.File.Seek(0, 0)
	err := decoder.Decode(&amp;rows)
	for err == nil {
		if err != nil {
			log.Println(&quot;decode error:&quot;, err)
		}
		fmt.Printf(&quot;%d %s %s\n&quot;, rows.ID, rows.Username, rows.Email)
		err = decoder.Decode(&amp;rows)
	}
}

My current issue is, I want to be able to retrieve specific rows based on ID. I know sqlite uses 4kb paging, in the sense that serialized rows occupy a "page" ie. 4KB till a page can't hold them anymore, then another is created. How do I mimick such a behaviour using gob in the most simplistic and idiomatic way?

Seen: I have seen this and this

答案1

得分: 2

一个 Gob 流可能包含类型定义和解码指令，所以你不能在 Gob 流中进行定位。你只能从开头开始读取，直到找到你需要的内容。

对于需要跳过元素的数据库存储格式来说，Gob 流完全不适用。

你可以创建一个新的编码器，将每个记录单独序列化，这样你就可以跳过元素（通过维护一个文件索引，存储每个记录的起始位置），但这样做效率非常低下且冗余（正如链接的答案中所描述的，随着你写入更多相同类型的值，速度和存储成本会分摊，而始终创建新的编码器会失去这种优势）。

一个更好的方法是不使用 encoding/gob，而是定义自己的格式。为了有效地支持搜索（select），你必须在可搜索的列/字段上构建某种索引，否则仍然需要执行全表扫描。

英文:

A Gob stream may contain type definitions and decoding instructions, so you can't seek a Gob stream. You can only read it from the beginning up to the point you find what you need.

A Gob stream is completely unsuitable for a database storage format in which you need to skip elements.

You could create a new encoder and serialize each records separately, in which case you could skip elements (by maintaining a file index storing which record starts at which position), but it would be terribly inefficient and redundant (as described in the linked answer, the speed and storage cost amortizes as you write more values of the same type, and always creating new encoders loses this gain).

A much better approach would be to not use encoding/gob for this, but rather define your own format. To efficiently support searches (select), you have to build some kind of index on the searchable columns / fields, else you still need to perform a full table scan.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用gob进行序列化到磁盘后的检索

问题

答案1

What is an elegant way to shut down a chain of goroutines linked by channels?

Terraform提供程序CRUD API洞察力

How to load DSA PublicKey using go stdlib

Golang致命错误：并发地读取和写入映射。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论