Go:从字符串中去除重音符号

huangapple go评论97阅读模式
英文:

Go: Removing accents from strings

问题

我是你的中文翻译助手,以下是你要翻译的内容:

我刚开始学习Go语言,尝试实现一个将带重音字符转换为无重音字符的函数。我试图按照这篇博客中给出的示例进行操作(请参阅标题“执行魔术”)。

根据我从中获取的信息,代码如下:

package main

import (
    "fmt"
    "unicode"
    "bytes"
    "code.google.com/p/go.text/transform"
    "code.google.com/p/go.text/unicode/norm"
)


func isMn (r rune) bool {
    return unicode.Is(unicode.Mn, r) // Mn: 非间距标记
}

func main() {
    r := bytes.NewBufferString("Your Śtring")
    t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
    r = transform.NewReader(r, t)
    fmt.Println(r)
}

但是它根本不起作用,而且说实话,我也不知道它的意思。有什么想法吗?

英文:

I'm new to Go and I'm trying to implement a function to convert accented characters into their non-accented equivalent. I'm attempting to follow the example given in this blog (see the heading 'Performing magic').

What I've attempted to gather from this is:

package main

import (
    "fmt"
    "unicode"
	"bytes"
    "code.google.com/p/go.text/transform"
    "code.google.com/p/go.text/unicode/norm"
)


func isMn (r rune) bool {
		return unicode.Is(unicode.Mn, r) // Mn: nonspacing marks
	}

func main() {
	r := bytes.NewBufferString("Your Śtring")
	t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
	r = transform.NewReader(r, t)
    fmt.Println(r)
}

It does not work in the slightest and I quite honestly don't know what it means anyway. Any ideas?

答案1

得分: 20

请注意,Go 1.5(2015年8月)或Go 1.6(2016年第一季度)可能会引入一个新的runes包,其中包含转换操作。

其中包括(在runes/example_test.go中)一个runes.Remove函数,它将帮助将résumé转换为resume

func ExampleRemove() {
    t := transform.Chain(norm.NFD, runes.Remove(runes.In(unicode.Mn)), norm.NFC)
    s, _, _ := transform.String(t, "résumé")
    fmt.Println(s)

    // Output:
    // resume
}

尽管这仍在审查中(截至2015年4月)。

英文:

Note that Go 1.5 (August 2015) or Go 1.6 (Q1 2016) could introduce a new runes package, with transform operations.

That includes (in runes/example_test.go) a runes.Remove function, which will help transform résumé into resume:

func ExampleRemove() {
	t := transform.Chain(norm.NFD, runes.Remove(runes.In(unicode.Mn)), norm.NFC)
	s, _, _ := transform.String(t, "résumé")
	fmt.Println(s)

	// Output:
	// resume
}

This is still being reviewed though (April 2015).

答案2

得分: 4

r应该是或类型为io.Reader,你不能像那样打印r。首先,你需要将内容读取到一个字节切片中:

var (
    s = "Your Śtring"
    b = make([]byte, len(s))

    r io.Reader = strings.NewReader(s)
)
t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
r = transform.NewReader(r, t)
r.Read(b)
fmt.Println(string(b))

这个代码可以工作,但是出乎意料地返回了"Your Stri",比需要的少两个字节。

下面是实际上可以实现你需要的版本,但是我仍然不确定为什么博客中的示例工作得如此奇怪。

s := "Yoùr Śtring"
b := make([]byte, len(s))

t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
_, _, e := t.Transform(b, []byte(s), true)
if e != nil { panic(e) }

fmt.Println(string(b))
英文:

r should be or type io.Reader, and you can't print r like that. First, you need to read the content to a byte slice:

 var (   
         s = "Your Śtring"
         b = make([]byte, len(s))
   
         r io.Reader = strings.NewReader(s)
 ) 
 t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
 r = transform.NewReader(r, t)
 r.Read(b)
 fmt.Println(string(b))

This works, but for some reason it returns me "Your Stri", two bytes less than needed.

This here is the version which actually does what you need, but I'm still not sure why the example from the blog works so strangely.

s := "Yoùr Śtring"
b := make([]byte, len(s))

t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
_, _, e := t.Transform(b, []byte(s), true)
if e != nil { panic(e) }

fmt.Println(string(b))

huangapple
  • 本文由 发表于 2014年7月6日 00:16:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/24588295.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定