在Go语言中解析Unicode数字

huangapple go评论93阅读模式
英文:

Parse unicode digits in Go

问题

其他答案提到使用unicode.IsDigit()来检查给定的符文是否是数字,但是我如何确定它是哪个数字呢?

strconv中的Atoi和ParseInt无法解析它。

IsDigit检查一个包含这些码点的表,但我从中无法得出任何信息。许多数字范围的起始码点以0结尾,但并非所有数字范围都是这样的,所以我不能只是使用char & 0xF

我唯一的想法是是否有一种方法可以访问符文的Unicode名称,或者是否可以访问属性。每个数字的Unicode字符(甚至是分数)似乎都有一个与之关联的纯ASCII数字在幕后作为属性,但我似乎找不到访问这些信息或名称的方法(例如,所有Unicode数字的名称都以"DIGIT ZERO"结尾)。我是否在这个问题上超出了标准库的范围?

英文:

Other answers mention using unicode.IsDigit() to check if a given rune is a digit or not, but how do I figure out which digit it is then?

Atoi and ParseInt from strconv won't parse it.

IsDigit checks a table with all of these codepoints in it, but I can't figure out anything from that. Many of the number ranges start with their 0 digit at a codepoint ending in 0, but not all of them so I can't just char & 0xF.

My only other thoughts is whether there's a way to either access the unicode name of a rune, or whether you can access properties. Every numeric unicode character (even fractions) seems to have a plain ASCII number associated with it behind the scenes as a property, but I can't seem to find a way to access either that information or the name (all unicode digits have names ending in "DIGIT ZERO" for example) anywhere. Am I looking/building outside of the standard library on this one?

答案1

得分: 4

使用runenames包根据名称识别数字。

这不是一个标准库包,但它是golang.org/x/的一部分。

这些包是Go项目的一部分,但不属于主要的Go树。它们的开发要求比Go核心更宽松。使用"go get"命令安装它们。

import (
	"golang.org/x/text/unicode/runenames";

	"fmt";
	"strings";
)

func whatDigit(digit rune) int {
	var name = runenames.Name(digit)
	switch {
	case strings.Contains(name, "DIGIT ZERO"):
		return 0
	case strings.Contains(name, "DIGIT ONE"):
		return 1
	case strings.Contains(name, "DIGIT TWO"):
		return 2
	case strings.Contains(name, "DIGIT THREE"):
		return 3
	case strings.Contains(name, "DIGIT FOUR"):
		return 4
	case strings.Contains(name, "DIGIT FIVE"):
		return 5
	case strings.Contains(name, "DIGIT SIX"):
		return 6
	case strings.Contains(name, "DIGIT SEVEN"):
		return 7
	case strings.Contains(name, "DIGIT EIGHT"):
		return 8
	case strings.Contains(name, "DIGIT NINE"):
		return 9
	default:
		return -1
	}

	return 0
}

该包提到了一个文档https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt,其中包含每个字符的更多信息,包括指定字符在纯ASCII中的数字。然而,该包只提供名称。浏览文档时,名称似乎遵循whatDigit函数中显示的模式。

英文:

Using the runenames package to identify a digit based on the name.

This isn't a stardard library package, but it is part of golang.org/x/
> These packages are part of the Go Project but outside the main Go tree. They are developed under looser compatibility requirements than the Go core. Install them with "go get".

import (
	"golang.org/x/text/unicode/runenames"

	"fmt"
	"strings"
)

func whatDigit(digit rune) int {
	var name = runenames.Name(digit)
	switch {
	case strings.Contains(name, "DIGIT ZERO"):
		return 0
	case strings.Contains(name, "DIGIT ONE"):
		return 1
	case strings.Contains(name, "DIGIT TWO"):
		return 2
	case strings.Contains(name, "DIGIT THREE"):
		return 3
	case strings.Contains(name, "DIGIT FOUR"):
		return 4
	case strings.Contains(name, "DIGIT FIVE"):
		return 5
	case strings.Contains(name, "DIGIT SIX"):
		return 6
	case strings.Contains(name, "DIGIT SEVEN"):
		return 7
	case strings.Contains(name, "DIGIT EIGHT"):
		return 8
	case strings.Contains(name, "DIGIT NINE"):
		return 9
	default:
		return -1
	}

	return 0
}

The package does mention a document https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt which seems to have further information for each character, including specifying which digit the character is in plain ASCII, however, this package only provides the name. Just looking through the document, the names seem to follow the pattern as shown in the whatDigit function.

huangapple
  • 本文由 发表于 2022年5月30日 03:17:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/72426628.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定