问题

我相信Go语言中没有LeftStr(str,n)（取前n个字符），RightStr(str,n)（取后n个字符）和SubStr(str,pos,n)（取pos后的n个字符）函数，所以我尝试自己写一个。

// 取前n个字符
func Left(str string, num int) string {
    if num <= 0 {
        return ``
    }
    if num > len(str) {
        num = len(str)
    }
    return str[:num]
}

// 取后n个字符
func Right(str string, num int) string {
    if num <= 0 {
        return ``
    }
    max := len(str)
    if num > max {
        num = max
    }
    num = max - num
    return str[num:]
}

但我相信当字符串包含Unicode字符时，这些函数会给出错误的输出。对于这些函数，最快的解决方案是什么？使用for range循环是唯一的方法吗？

英文:

I believe there are no LeftStr(str,n) (take at most n first characters), RightStr(str,n) (take at most n last characters) and SubStr(str,pos,n) (take first n characters after pos) function in Go, so I tried to make one

// take at most n first characters
func Left(str string, num int) string {
	if num &lt;= 0 {
		return ``
	}
	if num &gt; len(str) {
		num = len(str)
	}
	return str[:num]
}

// take at most last n characters
func Right(str string, num int) string {
	if num &lt;= 0 {
		return ``
	}
	max := len(str)
	if num &gt; max {
		num = max
	}
	num = max - num
	return str[num:]
}

But I believe those functions will give incorrect output when the string contains unicode characters. What's the fastest solution for those function, is using for range loop is the only way?

答案1

得分: 2

如已在评论中提到的，
组合字符、修改符文和其他多符文
“字符”
可能会导致困难。

对于对Go中的Unicode处理感兴趣的任何人，可能应该阅读Go博客文章
“Go中的字符串、字节、符文和字符”
和“Go中的文本规范化”。
特别是后者讨论了golang.org/x/text/unicode/norm包，该包可以帮助处理其中的一些问题。

您可以考虑从字符串中分割出前（或后）的“n个字符”的几个级别，这些级别越来越准确（或越来越了解Unicode）。

只使用n个字节。
这可能会在符文的中间分割，但是它的时间复杂度为O(1)，非常简单，并且在许多情况下，您知道输入只包含单字节符文。
例如：str[:n]。
在n个符文后进行分割。
这可能会在字符的中间分割。这可以很容易地完成，但代价是通过string([]rune(str)[:n])进行复制和转换。
您可以通过使用unicode/utf8包的DecodeRuneInString（和DecodeLastRuneInString）函数依次获取前n个符文的长度，然后返回str[:sum]（O(n)，无需分配）来避免转换和复制。
在第n个“边界”后进行分割。
一种方法是重复使用
norm.NFC.FirstBoundaryInString(str)
或norm.Iter来找到要分割的字节位置，然后返回str[:pos]。

考虑显示的字符串“cafés”，它可以在Go代码中表示为：“cafés”，“caf\u00E9s”或“caf\xc3\xa9s”，它们都会得到相同的六个字节。或者它可以表示为“cafe\u0301s”或“cafe\xcc\x81s”，它们都会得到相同的七个字节。

第一种“方法”可能会将它们分割为“caf\xc3”+“\xa9s”和“cafe\xcc”+“\x81s”。

第二种可能会将它们分割为“caf\u00E9”+“s”（“café”+“s”）和“cafe”+“\u0301s”（“cafe”+“́s”）。

第三种应该将它们分割为“caf\u00E9”+“s”和“cafe\u0301”+“s”（都显示为“café”+“s”）。

英文:

As mentioned in already in comments,
combining characters, modifying runes, and other multi-rune
"characters"
can cause difficulties.

Anyone interested in Unicode handling in Go should probably read the Go Blog articles
"Strings, bytes, runes and characters in Go"
and "Text normalization in Go".
In particular, the later talks about the golang.org/x/text/unicode/norm package which can help in handling some of this.

You can consider several levels increasingly of more accurate (or increasingly more Unicode aware) spiting the first (or last) "n characters" from a string.

Just use n bytes.
This may split in the middle of a rune but is O(1), is very simple, and in many cases you know the input consists of only single byte runes.
E.g. str[:n].
Split after n runes.
This may split in the middle of a character. This can be done easily, but at the expense of copying and converting with just string([]rune(str)[:n]).
You can avoid the conversion and copying by using the unicode/utf8 package's DecodeRuneInString (and DecodeLastRuneInString) functions to get the length of each of the first n runes in turn and then return str[:sum] (O(n), no allocation).
Split after the n'th "boundary".
One way to do this is to use
norm.NFC.FirstBoundaryInString(str) repeatedly
or norm.Iter to find the byte position to split at and then return str[:pos].

Consider the displayed string "cafés" which could be represented in Go code as: "cafés", "caf\u00E9s", or "caf\xc3\xa9s" which all result in the identical six bytes. Alternative it could represented as "cafe\u0301s" or "cafe\xcc\x81s" which both result in the identical seven bytes.

The first "method" above may split those into "caf\xc3"+"\xa9s" and cafe\xcc"+"\x81s".

The second may split them into "caf\u00E9"+"s" ("café"+"s") and "cafe"+"\u0301s" ("cafe"+"́s").

The third should split them into "caf\u00E9"+"s" and "cafe\u0301"+"s" (both shown as "café"+"s").

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Go's LeftStr, RightStr, SubStr

问题

答案1

我遇到了 AddressSanitizer：堆缓冲区溢出错误，用于最长回文子串。

如何使用Go编程语言，使用从PEM文件中读取的RSA私钥进行加密？

JSON中的匿名字段

如何设置 Ginkgo 测试套件？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论