在Go语言中读取US-ASCII文件。

huangapple go评论81阅读模式
英文:

Read US-ASCII File in go

问题

目前我正在尝试将一个us-ascii文件读入到golang中,但每次这样做时,每个特殊符号,如Ä Ö Ü ß都会被替换为?,或者在我的数据库中被特殊符号?替代。

有没有什么办法可以防止这种情况发生?

以下是我读取文件的方式:

file, err := os.Open(path)
if err != nil {
	return err
}
var lines []string
r := bufio.NewReader(file)
for {
	line, err := r.ReadBytes('\n')
	if err != nil {
		break
	}
	lines = append(lines, string(line))
}
fmt.Println(strings.Join(lines, ""))
index.Content = strings.Join(lines, "")
英文:

currently I try to read a us-ascii file into golang, but everytime I do so, every special sign, like Ä Ö Ü ß gets replaced with a ? or on my database with the special sign ?.

Is there anything I could do to prevent it?

Here is how I read my file:

file, err := os.Open(path)
if err != nil {
	return err
}
var lines []string
r := bufio.NewReader(file)
for {
	line, err := r.ReadBytes('\n')
	if err != nil {
		break
	}
	lines = append(lines, string(line))
}
fmt.Println(strings.Join(lines, ""))
index.Content = strings.Join(lines, "")

答案1

得分: 6

由于美国标准ASCII字符集中不存在字母Ä、Ö、Ü和ß,我可以推测你实际上正在处理Latin-1(ISO-8859-1)编码。

将Latin-1转换为UTF-8可以按如下方式进行:

runes := make([]rune, len(line))
for i, b := range line {
    runes[i] = rune(b)
}
lines = append(lines, string(runes))

编辑:

这个示例并不是最优化的,但它展示了如何将Latin-1字节存储在rune中,因为Latin-1的值直接对应于Unicode代码点。实际的UTF-8编码是在将[]rune转换为string时完成的。

英文:

Since the letters Ä Ö Ü ß doesn't exist in US-ASCII, I would make an educated guess that you are actually dealing with the Latin-1 (ISO-8859-1) encoding.

Converting from Latin-1 can be done like this:

runes := make([]rune, len(line))
for i, b := range line {
    runes[i] = rune(b)
}
lines = append(lines, string(runes))

Edit:

The example is not optimized, but it shows how a Latin-1 byte can be stored in a rune as the values of Latin-1 corresponds directly to the Unicode code point. The actual encoding into UTF-8 is then done when converting []rune to string.

huangapple
  • 本文由 发表于 2014年10月30日 23:18:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/26656862.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定