如何读取文件,在文件不是有效的UTF-8编码时中止并报错?

huangapple go评论140阅读模式
英文:

How to read a file, abort with error if it's not valid UTF-8?

问题

在Go语言中,我想逐行读取一个文件,将其存入str[]rune中。

该文件应该使用UTF-8编码,但我的程序不应该信任它。如果文件包含无效的UTF-8编码,我希望能正确处理错误。

有一个bytes.Runes(s []byte) []rune函数,但它没有错误返回值。当遇到无效的UTF-8编码时,它会引发panic吗?

英文:

In Go, I want to read in a file line by line, into str's or []rune's.

The file should be encoded in UTF-8, but my program shouldn't trust it. If it contains invalid UTF-8, I want to properly handle the error.

There is bytes.Runes(s []byte) []rune, but that has no error return value. Will it panic on encountering invalid UTF-8?

答案1

得分: 10

例如,

package main

import (
	"bufio"
	"fmt"
	"io/ioutil"
	"os"
	"strings"
	"unicode/utf8"
)

func main() {
	tFile := "text.txt"
	t := []byte{'\xFF', '\n'}
	ioutil.WriteFile(tFile, t, 0666)
	f, err := os.Open(tFile)
	if err != nil {
		fmt.Println(err)
		os.Exit(1)
	}
	defer f.Close()
	r := bufio.NewReader(f)
	s, err := r.ReadString('\n')
	if err != nil {
		fmt.Println(err)
		os.Exit(1)
	}
	s = strings.TrimRight(s, "\n")
	fmt.Println(t, s, []byte(s))
	if !utf8.ValidString(s) {
		fmt.Println("!utf8.ValidString")
	}
}

输出:

[255 10] � [255]
!utf8.ValidString
英文:

For example,

package main

import (
	"bufio"
	"fmt"
	"io/ioutil"
	"os"
	"strings"
	"unicode/utf8"
)

func main() {
	tFile := "text.txt"
	t := []byte{'\xFF', '\n'}
	ioutil.WriteFile(tFile, t, 0666)
	f, err := os.Open(tFile)
	if err != nil {
		fmt.Println(err)
		os.Exit(1)
	}
	defer f.Close()
	r := bufio.NewReader(f)
	s, err := r.ReadString('\n')
	if err != nil {
		fmt.Println(err)
		os.Exit(1)
	}
	s = strings.TrimRight(s, "\n")
	fmt.Println(t, s, []byte(s))
	if !utf8.ValidString(s) {
		fmt.Println("!utf8.ValidString")
	}
}

Output:

[255 10] � [255]
!utf8.ValidString

答案2

得分: 3

例如:

import (
    "io/ioutil"
    "log"
    "unicode/utf8"
)

// ...

buf, err := ioutil.ReadAll(fname)
if error != nil {
        log.Fatal(err)
}

size := 0
for start := 0; start < len(buf); start += size {
        var r rune
        if r, size = utf8.DecodeRune(buf[start:]); r == utf8.RuneError {
                log.Fatalf("无效的UTF-8编码,偏移量为 %d", start)
        }
}

utf8.DecodeRune 的 godocs:

> DecodeRune 解码 p 中的第一个 UTF-8 编码,并返回该符文及其字节宽度。如果编码无效,则返回 (RuneError, 1),这是正确的 UTF-8 的不可能结果。

英文:

For example:

import (
    &quot;io/ioutil&quot;
    &quot;log&quot;
    &quot;unicode/utf8&quot;
)

// ...

buf, err := ioutil.ReadAll(fname)
if error != nil {
        log.Fatal(err)
}

size := 0
for start := 0; start &lt; len(buf); start += size {
        var r rune
        if r, size = utf8.DecodeRune(buf[start:]); r == utf8.RuneError {
                log.Fatalf(&quot;invalid utf8 encoding at ofs %d&quot;, start)
        }
}

utf8.DecodeRune godocs:

> DecodeRune unpacks the first UTF-8 encoding in p and returns the rune
> and its width in bytes. If the encoding is invalid, it returns
> (RuneError, 1), an impossible result for correct UTF-8.

huangapple
  • 本文由 发表于 2012年12月15日 21:34:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/13892667.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定