英文:
Is there a way of cleaning up this Go code?
问题
我刚开始学习Go,并且已经编写了一个函数,用于解析带有标题的markdown文件,其中包含一些元数据(这些文件是博客文章)。
这是一个示例:
---
这里是一些标题
2012年9月19日
---
这是一些内容,请阅读。
我编写了这个函数,它可以工作,但我觉得它相当冗长和混乱,我已经查看了各种字符串包,但我对Go及其最佳实践了解不够,不知道应该做些什么不同的事情,如果我能得到一些清理代码的提示,我将不胜感激。(另外,我知道我不应该忽略那个错误)。
type Post struct {
Title string
Date string
Body string
}
func loadPost(title string) *Post {
filename := title + ".md"
file, _ := ioutil.ReadFile("posts/" + filename)
fileString := string(file)
str := strings.Split(fileString, "---")
meta := strings.Split(str[1], "\n")
title = meta[1]
date := meta[2]
body := str[2]
return &Post{Title: title, Date: date, Body: body}
}
英文:
I am just beginning to learn Go, and have made a function which parses markdown files with a header, containing some metadata (the files are blog posts).
here is an example:
---
Some title goes here
19 September 2012
---
This is some content, read it.
I've written this function, which works, but I feel it's quite verbose and messy, I've had a look at the various strings packages, but I don't know enough about Go and it's best practices to know what I should be doing differently, if I could get some tips to clean this up, I would appreciate it. (also, I know that i shouldn't be neglecting that error).
type Post struct {
Title string
Date string
Body string
}
func loadPost(title string) *Post {
filename := title + ".md"
file, _ := ioutil.ReadFile("posts/" + filename)
fileString := string(file)
str := strings.Split(fileString, "---")
meta := strings.Split(str[1], "\n")
title = meta[1]
date := meta[2]
body := str[2]
return &Post{Title: title, Date: date, Body: body}
}
答案1
得分: 4
我认为这还不错。有几个建议:
- 在“posts/”中硬编码的斜杠是依赖于平台的。你可以使用path/filepath.Join来避免这个问题。
- 有bytes.Split,所以你不需要使用string(file)。
- 你可以创建一个不重复字段的Post:
&Post{title, date, body}
或者,你可以找出正文从哪里开始,使用LastIndex(s, "—")
来相应地索引文件内容。这样可以避免使用Split时的分配。
const sep = "—"
func loadPost(content string) *Post {
sepLength := len(sep)
i := strings.LastIndex(content, sep)
headers := content[sepLength:i]
body := content[i+sepLength+1:]
meta := strings.Split(headers, "\n")
return &Post{meta[1], meta[2], body}
}
英文:
I think it's not bad. A couple of suggestions:
- The hard-coded slash in "posts/" is platform-dependent. You can use path/filepath.Join to avoid that.
- There is bytes.Split, so you don't need the string(file).
- You can create the Post without repeating the fields:
&Post{title, date, body}
Alternatively, you could find out where the body starts with LastIndex(s, "--")
and use that to index the file contents accordingly. This avoids the allocation of using Split.
const sep = "--"
func loadPost(content string) *Post {
sepLength := len(sep)
i := strings.LastIndex(content, sep)
headers := content[sepLength:i]
body := content[i+sepLength+1:]
meta := strings.Split(headers, "\n")
return &Post{meta[1], meta[2], body}
}
答案2
得分: 2
我同意这不错。我会添加一些其他的想法。
-
正如Thomas所示,你不需要中间变量title date和body。尝试一下这样写:
return &Post{ Title: meta[1], Date: meta[2], Body: body, }
不写字段名也是可以的,但有时我喜欢保留它们以使代码自我说明。(我认为go vet也喜欢它们。)
-
我对字符串和字节切片纠结得可能比应该多。由于你一次性读取文件,所以可能不需要担心这个。将所有内容转换为一个大字符串,然后对字符串进行切片是一种方便的方法,只要记住如果保留其中任何部分,就会将整个字符串固定在内存中。如果你的文件很大,或者有很多文件,而你最终只保留了大部分的元数据,那么这可能不是一个好方法。
-
每个文件只有一个博客条目吗?如果是这样,我想我会提出Thomas建议的一个变体。验证文件的前几个字节是---(或者你的文件是损坏的),然后使用strings.Index(fileString[3:], "----")。当你有未知数量的段时,使用Split更合适。在你的情况下,你只是在寻找元数据后的单个分隔符。Index会在搜索元数据后找到它并完成,而不会搜索整个正文。(而且,如果正文包含字符串"---"会怎么样呢?)
-
最后,有些人会使用正则表达式来处理这个。我还没有完全接受正则表达式,但无论如何,这是另一种方法。
英文:
I agree that it's not bad. I'll add a couple of other ideas.
-
As Thomas showed, you don't need the intermediate variables title date and body. Try though,
return &Post{ Title: meta[1], Date: meta[2], Body: body, }
It's true that you can leave the field names out, but I sometimes like them to keep the code self-documenting. (I think go vet likes them too.)
-
I fuss over strings versus byte slices, but probably more than I should. Since you're reading the file in one gulp, you probably don't need to worry about this. Converting everything to one big string and then slicing up the string is a handy way of doing things, just remember that you're pinning the entire string in memory if you keep any part of it. If your files are large or you have lots of them and you only end up keeping, say, the meta for most of them, this might not be the way to go.
-
There's just one blog entry per file? If so, I think I'll propose a variant of Thomas's suggestion. Verify the first bytes are --- (or your file is corrupt), then use strings.Index(fileString[3:], "---"). Split is more appropriate when you have an unknown number of segments. In your case you're just looking for that single separator after the meta. Index will find it after searching the meta and be done, without searching through the whole body. (And anyway, what if the body contained the string "---"?)
-
Finally, some people would use regular expressions for this. I still haven't warmed up to regular expressions, but anyway, it's another approach.
答案3
得分: 1
package main
import (
"fmt"
"strings"
)
const sep = "---"
type parseError struct {
msg string
}
func (e *parseError) Error() string {
return e.msg
}
func parse(s string) (header []string, content string, err error) {
if !strings.HasPrefix(s, sep) {
return header, content, &parseError{"content does not start with ---
!"}
}
arr := strings.SplitN(s, sep, 3)
if len(arr) < 3 {
return header, content, &parseError{"header was not terminated with ---
!"}
}
header = strings.Split(strings.TrimSpace(arr[1]), "\n")
content = strings.TrimSpace(arr[2])
return header, content, nil
}
func main() {
//
f := `---
Some title goes here
19 September 2012
This is some content, read it. --Anonymous`
header, content, err := parse(f)
if err != nil {
panic(err)
}
for i, val := range header {
fmt.Println(i, val)
}
fmt.Println("---")
fmt.Println(content)
//
f = `---
Some title goes here
19 September 2012
This is some content, read it.`
_, _, err = parse(f)
fmt.Println("Error:", err)
//
f = `
Some title goes here
19 September 2012
This is some content, read it.`
_, _, err = parse(f)
fmt.Println("Error:", err)
}
英文:
Sonia has some great suggestions. Below is my take which accounts for problems you might encounter when parsing the header.
http://play.golang.org/p/w-XYyhPj9n
package main
import (
"fmt"
"strings"
)
const sep = "---"
type parseError struct {
msg string
}
func (e *parseError) Error() string {
return e.msg
}
func parse(s string) (header []string, content string, err error) {
if !strings.HasPrefix(s, sep) {
return header, content, &parseError{"content does not start with `---`!"}
}
arr := strings.SplitN(s, sep, 3)
if len(arr) < 3 {
return header, content, &parseError{"header was not terminated with `---`!"}
}
header = strings.Split(strings.TrimSpace(arr[1]), "\n")
content = strings.TrimSpace(arr[2])
return header, content, nil
}
func main() {
//
f := `---
Some title goes here
19 September 2012
---
This is some content, read it. --Anonymous`
header, content, err := parse(f)
if err != nil {
panic(err)
}
for i, val := range header {
fmt.Println(i, val)
}
fmt.Println("---")
fmt.Println(content)
//
f = `---
Some title goes here
19 September 2012
This is some content, read it.`
_, _, err = parse(f)
fmt.Println("Error:", err)
//
f = `
Some title goes here
19 September 2012
---
This is some content, read it.`
_, _, err = parse(f)
fmt.Println("Error:", err)
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论