2015年2月20日 17:41:57go评论108阅读模式

英文:

In Go, how can I state I want to match all kinds of space, including the non-breaking one?

问题

我必须匹配一个给定的模式，看起来像这样：

地点 *: *(.*)

换句话说，我有一个标签，一些空格，一个冒号，一些空格，和我想要的值。

然而，我的数据中有一些地方，其中空格不是通常的20 ASCII字符，而是非断行空格（Unicode字符\u00A0）。我该如何匹配它们？我考虑使用

地点\s*:\s*(.*)

但它似乎不能匹配\u00A0空格。这是正则表达式模块的一个错误还是预期的行为？如果是后者，我该如何匹配所有类型的空格而不列举它们？

英文:

I have to match a given pattern that looks like this :

Place *: *(.*)

In other words, I have a label, some spaces, a colon, some spaces, and the value I want.

However, I have in my data some places where spaces are not the usual 20 ASCII character, but non-breaking spaces (unicode character \u00A0). How can I match them ? I thought of using

Place\s*:\s*(.*)

but it does not seem to work on the \u00A0 whitespace. Is this a bug of the regexp module or is this wanted behavior ? If it is the latter, how can I match all kinds of spaces without listing them all ?

答案1

得分: 7

re2语法将\s限制为(≡ [\t\n\f\r ])，这似乎是相当标准的。

在使用正则表达式之前，预处理字符串可能更容易做到这一点。
例如，strings.Fields()会将字符串按空格分割，包括Unicode空格符。

// Fields函数根据unicode.IsSpace定义的规则，将字符串s按照一个或多个连续的空白字符分割，返回s的子字符串数组，如果s只包含空白字符，则返回空列表。
func Fields(s string) []string {
    return FieldsFunc(s, unicode.IsSpace)
}

这将处理不可打断的空格，因为unicode.IsSpace()报告该符文是否是由Unicode的空格属性定义的空格字符；在Latin-1空格中，这些字符包括：

'\t', '\n', '\v', '\f', '\r', ' ', U+0085 (NEL), U+00A0 (NBSP).

英文:

The re2 syntax does limit \s to (≡ [\t\n\f\r ]), which seems pretty much standard.

That might be the case where pre-processing the string, before using a regexp, is easier to do.
For example strings.Fields() would split the string around spaces, including unicode space runes.

// Fields splits the string s around each instance of one or more consecutive white space
// characters, as defined by unicode.IsSpace, returning an array of substrings of s or an
// empty list if s contains only white space.
func Fields(s string) []string {
    return FieldsFunc(s, unicode.IsSpace)
}

That would take care of non-breakable space, since unicode.IsSpace() reports whether the rune is a space character as defined by Unicode's White Space property; in the Latin-1 space this is:

&#39;\t&#39;, &#39;\n&#39;, &#39;\v&#39;, &#39;\f&#39;, &#39;\r&#39;, &#39; &#39;, U+0085 (NEL), U+00A0 (NBSP).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Go语言中，如何表示我想匹配所有类型的空格，包括不换行的空格？

问题

答案1

如何将一个嵌套的结构转换为另一个结构，而不使用嵌套循环？

Golang 重用内存地址从切片复制吗？

Apache Pulsar：从指定的消息 ID 读取/消费消息到结束消息 ID？

为什么 go-validator 必须是一个指针？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。