2017年4月10日 21:44:01go评论97阅读模式

英文:

String prefix of requested length in golang working with utf-8 symbols

问题

有没有一种优雅的方法来裁剪字符串并创建漂亮的字符串前缀在Go语言中？我有以下的开始函数：

func prettyCrop(in string, cropLength int) string {
    if len(in) < cropLength {
        return in
    } else {
        in = in[0:cropLength]
        in = strings.TrimRightFunc(in, func(r rune) bool {
            if r == ' ' {
                return true
            }
            return false
        })
        return in + "…"
    }
}

它对英文文本来说效果还不错，但在处理一些更复杂的文本时会出现问题。看看这个例子：

prettyCrop("čřč čřč", 8) // čř?…

TrimRightFunc 在这里的工作方式不符合我的预期。我期望它返回 čřč。为什么这个函数没有返回有效的 UTF-8 字符串？有没有相关的库可以解决这个问题？我该如何修复它？有没有更好的解决方案？

英文:

Is there some elegant way to crop string and create pretty string prefixes in golang? I have this function for start:

func prettyCrop(in string, cropLength int) string {
	if len(in) &lt; cropLength {
		return in
	} else {
		in = in[0:cropLength]
		in = strings.TrimRightFunc(in, func(r rune) bool {
			if r == &#39; &#39; {
				return true
			}
			return false
		})
		return in + &quot;…&quot;
	}
}

It works good enough for english texts, but has problems with something more complicated. See this example:

prettyCrop(&quot;čřč čřč&quot;, 8) //čř?…

TrimRightFunc is not working as I expect here. I expect it to return čřč. Why is this function not returning valid utf-8 string? Is there a library for this? How can I fix it? Is there a better solution?

答案1

得分: 2

问题在于对string进行切片时，切片的是表示字符串的UTF-8编码字节切片，而不是字符串的字符或rune。这也意味着，如果string包含由多个字节表示的字符（UTF-8编码），对string进行切片可能会导致无效的UTF-8编码序列。

假设cropLength表示字符限制（而不是字节计数限制），你应该首先将string转换为[]rune，然后对其进行操作：

func prettyCrop(in string, cropLength int) string {
    in2 := []rune(in)
    if len(in2) < cropLength {
        return in
    } else {
        in2 = in2[:cropLength]
        in = strings.TrimRightFunc(string(in2), func(r rune) bool {
            if r == ' ' {
                return true
            }
            return false
        })
        return in + "…"
    }
}

测试代码：

for i := 0; i < 7; i++ {
    fmt.Println(prettyCrop("čřč čřč", i))
}

输出结果（在Go Playground上尝试）：

…
č…
čř…
čřč…
čřč…
čřč č…
čřč čř…

性能注意事项：

上面的示例不太“高性能”，因为：

它将整个in字符串转换为[]rune，只需使用for range获取其前cropLength个符文即可。
调用strings.TrimRightFunc()需要将[]rune再次转换为string，然后执行字符串连接以生成结果。可以通过手动遍历[]rune，并只创建一个返回的string来避免这种情况。

英文:

The problem is that slicing a string slices the UTF-8 encoded byte slice that represents the string, not the characters or runes of the string; this also means that if the string contains characters that are represented by multiple bytes in UTF-8 encoding, slicing / cutting the string may result in an invalid UTF-8 encoded sequence.

Assuming cropLength means to be a character limit (and not a byte-count limit), you should first convert the string to a []rune, and operate on that:

func prettyCrop(in string, cropLength int) string {
	in2 := []rune(in)
	if len(in2) &lt; cropLength {
		return in
	} else {
		in2 = in2[:cropLength]
		in = strings.TrimRightFunc(string(in2), func(r rune) bool {
			if r == &#39; &#39; {
				return true
			}
			return false
		})
		return in + &quot;…&quot;
	}
}

Testing it:

for i := 0; i &lt; 7; i++ {
	fmt.Println(prettyCrop(&quot;čřč čřč&quot;, i))
}

Output (try it on the Go Playground):

…
č…
čř…
čřč…
čřč…
čřč č…
čřč čř…

Performance notes:

The above example is not "performance" friendly, because:

It converts the whole in string to []rune, it would be enough to just get its first cropLength runes with a for range.
Calling strings.TrimRightFunc() requires to convert the []rune back to string, and then again a string concatenation is performed to generate the result. This could be avoided by manually looping over the []rune, and only create a single string that is returned.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

String prefix of requested length in golang working with utf-8 symbols

问题

答案1

致命错误：在尝试通过Visual Studio Code安装gopls/Go工具时找不到’dirent.h’文件。

指针上未定义运算符<=。

XML in Go – how to take either tag and match it to the field of a struct?

golang在syscall.Mount中没有这样的设备。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论