识别字符串中的双字节字符,并将其转换为单字节字符。

huangapple go评论90阅读模式
英文:

Identify double byte character in a string and convert that into a single byte character

问题

在我的Go项目中,我正在处理亚洲语言,并且存在双字节字符。在我的情况下,我有一个包含两个单词并且它们之间有一个空格的字符串。

例如:こんにちは 世界

现在我需要检查这个空格是否是双字节空格,如果是的话,我需要将其转换为单字节空格。

我已经搜索了很多,但是我找不到一种方法来做到这一点。由于我无法找到一种方法来做到这一点,很抱歉我没有代码示例可以提供。

我需要遍历每个字符并使用其代码选择双字节空格并替换吗?我应该使用什么代码来识别双字节空格?

英文:

In my Go project, I am dealing with asian languages and There are double byte characters. In my case, I have a string which contains two words and there is a space between them.

EG: こんにちは 世界

Now I need to check if that space is a double byte space and if so, I need to convert that into single byte space.

I have searched a lot, but I couldn't find a way to do this. Since I cannot figure out a way to do this, sorry I have no code sample to add here.

Do I need to loop through each character and pick the double byte space using its code and replace? What is the code I should use to identify double byte space?

答案1

得分: 2

只需替换?

package main

import (
	"fmt"
	"strings"
)

func main() {
	fmt.Println(strings.Replace("こんにちは 世界", " ", " ", -1))
}

请注意,Replace 函数的第二个参数是  ,与您在示例中的字符串一样。此替换函数将查找原始字符串中与之匹配的所有 rune,并将其替换为 ASCII 空格

英文:

Just replace?

package main

import (
	"fmt"
	"strings"
)

func main()  {
	fmt.Println(strings.Replace("こんにちは 世界", " ", " ", -1))
}

Notice that the second argument in Replace is  , as copy-paste from your string in example. This replace function will find all rune that match that in the original string and replace it with ASCII space

答案2

得分: 2

在Go语言中,没有像双字节字符那样的概念。有一种特殊类型叫做rune,在底层是int32类型,用于表示Unicode字符。

特殊空格的Unicode码是12288,普通空格的Unicode码是32

要遍历字符,可以使用range关键字:

for _, char := range chars {...} // char是rune类型

要替换这个字符,可以使用strings.Replacestrings.Map,并定义一个用于替换不需要的字符的函数。

func converter(r rune) rune {
	if r == 12288 {
		return 32
	}
	return r
}
result := strings.Map(converter, "こんにちは 世界")

也可以使用字符字面值代替数字:

if r == ' ' {
	return ' '
}
英文:

In golang there is nothing like double byte character. There is special type rune which is int32 under hood and rune is unicode representation.

your special space is 12288 and normal space is 32 unicode.

To iterate over characters you can use range

for _, char := range chars {...} // char is rune type

To replace this character you can use strings.Replace or strings.Map and define function for replacement of unwanted characters.

func converter(r rune) rune {
	if r == 12288 {
		return 32
	}
	return r
}
result := strings.Map(converter, "こんにちは 世界")

It is also posible to use characters literals instead of numbers

if r == ' ' {
	return ' '
}

huangapple
  • 本文由 发表于 2021年10月2日 17:50:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/69415789.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定