rune函数在Go语言中的使用时机是什么?

huangapple go评论81阅读模式
英文:

when do we use rune function in golang work?

问题

我是一个Golang的初学者...

我发现在这段代码中,使用rune(char) == "-"来检查一个单词中的字符是否与连字符匹配,而不是使用char == "-"来检查。

以下是代码:

package main

import (
	"fmt"
	"unicode"
)

func CodelandUsernameValidation(str string) bool {

  // 代码在这里
  if len(str) >= 4 && len(str) <= 25   {
   if unicode.IsLetter(rune(str[0])) {
	   for _,char := range str {
		if !unicode.IsLetter(rune(char)) && !unicode.IsDigit(rune(char)) &&  !(rune(char) == '_') {
          return false
		  
		}
	   }
	   return true
   }
  }
  return false;

}

func main() {

  // 不要修改下面的内容,readline是我们的函数
  // 可以正确读取输入
  var user string
  fmt.Println("输入用户名")
  fmt.Scan(&user)
  fmt.Println(CodelandUsernameValidation(user))

}

请问为什么这里需要使用rune?

英文:

I am a beginner in Golang...

I found that rune(char) == &quot;-&quot; has been used to check if a character in a word matches with hyphen instead of checking it as char == &quot;-&quot;.

Here is the code:

package main

import (
	&quot;fmt&quot;
	&quot;unicode&quot;
)

func CodelandUsernameValidation(str string) bool {

  // code goes here
  if len(str) &gt;= 4 &amp;&amp; len(str) &lt;= 25   {
   if unicode.IsLetter(rune(str[0])) {
	   for _,char := range str {
		if !unicode.IsLetter(rune(char)) &amp;&amp; !unicode.IsDigit(rune(char)) &amp;&amp;  !(rune(char) == &#39;_&#39;) {
          return false
		  
		}
	   }
	   return true
   }
  }
  return false;

}

func main() {

  // do not modify below here, readline is our function
  // that properly reads in the input for you
  var user string
  fmt.Println(&quot;Enter Username&quot;)
  fmt.Scan(&amp;user)
  fmt.Println(CodelandUsernameValidation(user))

}

Could you please clarify why rune is required here?

答案1

得分: 1

问题中的代码必须将字节str[0]转换为符文,以便调用unicode.IsLetter。否则,不需要进行符文转换。

所需的字节到符文转换提示了一个问题:应用程序将字节视为符文,但字节不是符文。

通过使用for range循环遍历字符串中的符文来修复。这样可以消除代码中的转换:

func CodelandUsernameValidation(str string) bool {
    if len(str) < 4 || len(str) > 25 {
        return false
    }
    for i, r := range str {
        if i == 0 && !unicode.IsLetter(r) {
            // str必须以字母开头
            return false
        } else if !unicode.IsLetter(r) && !unicode.IsDigit(r) && !(r == '_') {
            // str仅限于字母、数字和_
            return false
        }
    }
    return true
}
英文:

The code in the question must convert the byte str[0] to a rune for the call to unicode.IsLetter. Otherwise, the rune conversions are not needed.

The required byte to rune conversion hints a problem: The application is treating a byte as a rune, but bytes are not runes.

Fix by using for range to iterate through the runes in the string. This eliminates conversions from the code:

func CodelandUsernameValidation(str string) bool {
	if len(str) &lt; 4 || len(str) &gt; 25 {
		return false
	}
	for i, r := range str {
		if i == 0 &amp;&amp; !unicode.IsLetter(r) {
            // str must start with a letter
			return false
		} else if !unicode.IsLetter(r) &amp;&amp; !unicode.IsDigit(r) &amp;&amp; !(r == &#39;_&#39;) {
            // str is restricted to letters, digit and _.
			return false
		}
	}
	return true
}

答案2

得分: 1

我们首先需要知道的是,rune只是int32的别名。单引号表示一个rune,双引号表示一个字符串。所以,rune(char) == "-"应该是rune(char) == '-'

其次,我们需要知道的是,对字符串进行索引循环访问返回的是单个字节,而不是字符。例如,unicode.IsLetter(rune(str[0]))中的str[0]返回的是一个字节,它是uint8的别名,而不是字符。这在某些情况下会失败,因为某些字符的编码长度超过1个字节,这是由于UTF-8编码。例如,字符⌘由字节[e2 8c 98]表示,这些字节是UTF-8编码。在你的示例代码中,如果你尝试访问str[0],它将返回e2,这可能是一个无效的UTF-8码点,或者它将表示另一个字符,它是一个单个UTF-8编码字节。所以在这里你可以这样做:

strbytes := []byte(str)

firstChar, size := utf8.DecodeRune(strbytes)

相比之下,for range循环在每次迭代时解码一个UTF-8编码的rune。每次循环时,循环的索引是当前rune的起始位置(以字节为单位),而代码点是其值。所以在示例代码中的for _,char := range str {中,char的类型是rune,你再次尝试将rune转换为rune,这是重复的工作。

如果想要了解更多关于Go语言中字符串的工作原理,这里有一篇由Rob Pike撰写的优秀文章。

英文:

The first thing we need to know is that rune is nothing but an alias of int32. Single quotes represent a rune and double quotes represent a string. so instead of this rune(char) == "-" it should be rune(char) == '-'.

> comment from builtin package
>
> // rune is an alias for int32 and is equivalent to int32 in all ways.
> It is // used, by convention, to distinguish character values from
> integer values.

Second, here we need to know that A loop over the string and accesses it by index returns individual bytes, not characters. like here unicode.IsLetter(rune(str[0])). str[0] returns a byte which is the alias of uint8 not characters. it will fail for some cases because some characters encoded have a length of more than 1 byte because UTF-8. for example take this character ⌘ is represented by the bytes [e2 8c 98] and that those bytes are the UTF-8 encoding, in your example code if you try to access str[0] it will return e2 which may an invalid UTF-8 codepoint or it will represent another character which is a single UTF-8 encoded byte. so here you do like this

strbytes := []byte(str)

firstChar, size := utf8.DecodeRune(strbytes )

A for range loop, by contrast, decodes one UTF-8-encoded rune on each iteration. Each time around the loop, the index of the loop is the starting position of the current rune, measured in bytes, and the code point is its value. so in the example code for _,char := range str { the type of char is rune and again you are trying to convert rune to rune which is duplicated the work.

if want to learn more about strings how they work in Golang here is a great post by Rob Pike

答案3

得分: 0

你需要将字符串转换为[]rune类型:

r := []rune(str)

这行代码必须是函数CodelandUsernameValidation中的第一行。

英文:

You need to translate from str to []rune

r := []rune(str)

This must be the first line in the function CodelandUsernameValidation.

huangapple
  • 本文由 发表于 2022年2月1日 04:52:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/70932665.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定