将文本输入规范化为ASCII

huangapple go评论67阅读模式
英文:

Normalizing text input to ASCII

问题

我正在构建一个小工具,用于解析用户的输入并查找写作中常见的陷阱,并标记它们,以便用户可以改进他们的文本。到目前为止,除了使用花引号的文本之外,一切都运行良好。我现在有一个解决方法,可以替换开头(和结尾)的单引号和双引号,如下所示:

cleanedData := bytes.Replace([]byte(data), []byte("’"), []byte("'"), -1)

我觉得在标准库中一定有更好的处理方法,这样我还可以将其他非ASCII字符转换为ASCII等效字符。非常感谢任何帮助。

英文:

I am building a small tool which parses a user's input and finds common pitfalls in writing and flags them so the user can improve their text. So far everything works well except for text that has curly quotes compared to normal ASCII straight quotes. I have a hack now which will do a string replacement for opening (and closing) single curly quotes and double opening (and close) curly quotes like so:

cleanedData := bytes.Replace([]byte(data), []byte("’"), []byte("'"), -1)

I feel like there must be a better way to handle this in the stdlib so I can also convert other non-ascii characters to an ascii equivalent. Any help would be greatly appreciated.

答案1

得分: 6

strings.Map 函数看起来就是你想要的。

我不知道有一个通用的 'ToAscii' 类型的函数,但是 Map 函数有一个很好的方法来将符文映射到其他符文。

示例(已更新):

func main() {
    data := "Hello “Frank” or ‹François› as you like to be ‘called’"
    fmt.Printf("Original: %s\n", data)
    cleanedData := strings.Map(normalize, data)
    fmt.Printf("Cleaned: %s\n", cleanedData)
}

func normalize(in rune) rune {
    switch in {
    case '“', '‹', '”', '›':
        return '"'
    case '‘', '’':
        return '\''
    }
    return in
}

输出:

Original: Hello “Frank” or ‹François› as you like to be ‘called’
Cleaned: Hello "Frank" or "François" as you like to be 'called'
英文:

The strings.Map function looks to me like what you want.

I don't know of a generic 'ToAscii' type function, but Map has a nice approach for mapping runes to other runes.

Example (updated):

func main() {
	data := "Hello “Frank” or ‹François› as you like to be ‘called’"
	fmt.Printf("Original: %s\n", data)
	cleanedData := strings.Map(normalize, data)
	fmt.Printf("Cleaned: %s\n", cleanedData)
}

func normalize(in rune) rune {
	switch in {
	case '“', '‹', '”', '›':
		return '"'
	case '‘', '’':
		return '\''
	}
	return in
}

Output:

Original: Hello “Frank” or ‹François› as you like to be ‘called’
Cleaned: Hello "Frank" or "François" as you like to be 'called'

huangapple
  • 本文由 发表于 2016年4月2日 00:10:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/36360992.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定