英文:
How to read a file, abort with error if it's not valid UTF-8?
问题
在Go语言中,我想逐行读取一个文件,将其存入str
或[]rune
中。
该文件应该使用UTF-8编码,但我的程序不应该信任它。如果文件包含无效的UTF-8编码,我希望能正确处理错误。
有一个bytes.Runes(s []byte) []rune
函数,但它没有错误返回值。当遇到无效的UTF-8编码时,它会引发panic吗?
英文:
In Go, I want to read in a file line by line, into str
's or []rune
's.
The file should be encoded in UTF-8, but my program shouldn't trust it. If it contains invalid UTF-8, I want to properly handle the error.
There is bytes.Runes(s []byte) []rune
, but that has no error return value. Will it panic on encountering invalid UTF-8?
答案1
得分: 10
例如,
package main
import (
"bufio"
"fmt"
"io/ioutil"
"os"
"strings"
"unicode/utf8"
)
func main() {
tFile := "text.txt"
t := []byte{'\xFF', '\n'}
ioutil.WriteFile(tFile, t, 0666)
f, err := os.Open(tFile)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
defer f.Close()
r := bufio.NewReader(f)
s, err := r.ReadString('\n')
if err != nil {
fmt.Println(err)
os.Exit(1)
}
s = strings.TrimRight(s, "\n")
fmt.Println(t, s, []byte(s))
if !utf8.ValidString(s) {
fmt.Println("!utf8.ValidString")
}
}
输出:
[255 10] � [255]
!utf8.ValidString
英文:
For example,
package main
import (
"bufio"
"fmt"
"io/ioutil"
"os"
"strings"
"unicode/utf8"
)
func main() {
tFile := "text.txt"
t := []byte{'\xFF', '\n'}
ioutil.WriteFile(tFile, t, 0666)
f, err := os.Open(tFile)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
defer f.Close()
r := bufio.NewReader(f)
s, err := r.ReadString('\n')
if err != nil {
fmt.Println(err)
os.Exit(1)
}
s = strings.TrimRight(s, "\n")
fmt.Println(t, s, []byte(s))
if !utf8.ValidString(s) {
fmt.Println("!utf8.ValidString")
}
}
Output:
[255 10] � [255]
!utf8.ValidString
答案2
得分: 3
例如:
import (
"io/ioutil"
"log"
"unicode/utf8"
)
// ...
buf, err := ioutil.ReadAll(fname)
if error != nil {
log.Fatal(err)
}
size := 0
for start := 0; start < len(buf); start += size {
var r rune
if r, size = utf8.DecodeRune(buf[start:]); r == utf8.RuneError {
log.Fatalf("无效的UTF-8编码,偏移量为 %d", start)
}
}
utf8.DecodeRune 的 godocs:
> DecodeRune 解码 p 中的第一个 UTF-8 编码,并返回该符文及其字节宽度。如果编码无效,则返回 (RuneError, 1),这是正确的 UTF-8 的不可能结果。
英文:
For example:
import (
"io/ioutil"
"log"
"unicode/utf8"
)
// ...
buf, err := ioutil.ReadAll(fname)
if error != nil {
log.Fatal(err)
}
size := 0
for start := 0; start < len(buf); start += size {
var r rune
if r, size = utf8.DecodeRune(buf[start:]); r == utf8.RuneError {
log.Fatalf("invalid utf8 encoding at ofs %d", start)
}
}
utf8.DecodeRune godocs:
> DecodeRune unpacks the first UTF-8 encoding in p and returns the rune
> and its width in bytes. If the encoding is invalid, it returns
> (RuneError, 1), an impossible result for correct UTF-8.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论