英文:
Read US-ASCII File in go
问题
目前我正在尝试将一个us-ascii文件读入到golang中,但每次这样做时,每个特殊符号,如Ä Ö Ü ß都会被替换为?,或者在我的数据库中被特殊符号?替代。
有没有什么办法可以防止这种情况发生?
以下是我读取文件的方式:
file, err := os.Open(path)
if err != nil {
return err
}
var lines []string
r := bufio.NewReader(file)
for {
line, err := r.ReadBytes('\n')
if err != nil {
break
}
lines = append(lines, string(line))
}
fmt.Println(strings.Join(lines, ""))
index.Content = strings.Join(lines, "")
英文:
currently I try to read a us-ascii file into golang, but everytime I do so, every special sign, like Ä Ö Ü ß gets replaced with a ? or on my database with the special sign ?.
Is there anything I could do to prevent it?
Here is how I read my file:
file, err := os.Open(path)
if err != nil {
return err
}
var lines []string
r := bufio.NewReader(file)
for {
line, err := r.ReadBytes('\n')
if err != nil {
break
}
lines = append(lines, string(line))
}
fmt.Println(strings.Join(lines, ""))
index.Content = strings.Join(lines, "")
答案1
得分: 6
由于美国标准ASCII字符集中不存在字母Ä、Ö、Ü和ß,我可以推测你实际上正在处理Latin-1(ISO-8859-1)编码。
将Latin-1转换为UTF-8可以按如下方式进行:
runes := make([]rune, len(line))
for i, b := range line {
runes[i] = rune(b)
}
lines = append(lines, string(runes))
编辑:
这个示例并不是最优化的,但它展示了如何将Latin-1字节存储在rune
中,因为Latin-1的值直接对应于Unicode代码点。实际的UTF-8编码是在将[]rune
转换为string
时完成的。
英文:
Since the letters Ä Ö Ü ß doesn't exist in US-ASCII, I would make an educated guess that you are actually dealing with the Latin-1 (ISO-8859-1) encoding.
Converting from Latin-1 can be done like this:
runes := make([]rune, len(line))
for i, b := range line {
runes[i] = rune(b)
}
lines = append(lines, string(runes))
Edit:
The example is not optimized, but it shows how a Latin-1 byte can be stored in a rune
as the values of Latin-1 corresponds directly to the Unicode code point. The actual encoding into UTF-8 is then done when converting []rune
to string
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论