Regex "before text matching" in GoLang

huangapple go评论83阅读模式
英文:

Regex "before text matching" in GoLang

问题

我有一段 JavaScript 代码,我想用 GoLang 替换它。逻辑要求我只在字符串中的分号 (;) 后面跟着 "I" 或 "D" 时进行分割:

I.E.viewability:-2;D.ua:Mozilla/5.0 (Linux; Android 7.0; SM-G920W8 Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36;D.G.city:Burnaby;D.G.zip:V5C;D.G.region:BC;D.G.E.country_code2:CA;

在 JavaScript 中,我使用以下代码实现:

/;(?=[ID]|$)/

我了解到 GoLang 使用的是这个正则表达式库:

https://github.com/google/re2/wiki/Syntax

该库明确显示上述语法(称为“before text matching re”)不受支持。

在 GoLang 中,如何以正确的方式实现相同的结果呢?

英文:

I have a piece of JavaScript code that I'm trying to replace with GoLang. The logic requires me to split the following string on ";" only when followed by "I" or "D":

I.E.viewability:-2;D.ua:Mozilla/5.0 (Linux; Android 7.0; SM-G920W8 Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36;D.G.city:Burnaby;D.G.zip:V5C;D.G.region:BC;D.G.E.country_code2:CA;

In JavaScript I accomplish this using:

/;(?=[ID]|$)/

My understanding is that GoLang uses this regex lib

https://github.com/google/re2/wiki/Syntax

which clearly shows the above syntax (called before text matching re) as not supported.

What would be the correct way of achieving the same result in GoLang?

答案1

得分: 4

你可以“反转”正则表达式以匹配你需要的字符串。你想匹配除了;之外的任意1个或多个字符,后面跟着一个不是ID;

使用以下正则表达式:

[^;]+(?:;[^ID;][^;]*)*

详细说明:

  • [^;]+ - 除了;之外的1个或多个字符
  • (?:;[^ID;][^;]*)* - 零个或多个序列:
    • ; - 一个;
    • [^ID;] - 除了ID;之外的字符(为了不匹配空值)
    • [^;]* - 除了;之外的零个或多个字符

请参见正则表达式演示

示例代码

package main

import (
    "regexp"
    "fmt"
)

func main() {
    var re = regexp.MustCompile(`[^;]+(?:;[^ID;][^;]*)*`)
    var str = `I.E.viewability:-2;D.ua:Mozilla/5.0 (Linux; Android 7.0; SM-G920W8 Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36;D.G.city:Burnaby;D.G.zip:V5C;D.G.region:BC;D.G.E.country_code2:CA;`
    
    for _, match := range re.FindAllString(str, -1) {
        fmt.Println(match)
    }
}

输出结果:

I.E.viewability:-2
D.ua:Mozilla/5.0 (Linux; Android 7.0; SM-G920W8 Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36
D.G.city:Burnaby
D.G.zip:V5C
D.G.region:BC
D.G.E.country_code2:CA
英文:

You may "reverse" the regex to match the strings you need. You want to match any 1+ chars other than ; followed with ; that are not followed with I or D.

Use

[^;]+(?:;[^ID;][^;]*)*

See the regex demo

Details:

  • [^;]+ - 1 or more chars other than ;
  • (?:;[^ID;][^;]*)* - zero or more sequences of:
    • ; - a ;
    • [^ID;] - a char other than I, D or ; (that is in order not to match empty values)
    • [^;]* - zero or more chars other than ;

See a Go demo.

package main

import (
    "regexp"
    "fmt"
)

func main() {
    var re = regexp.MustCompile(`[^;]+(?:;[^ID;][^;]*)*`)
    var str = `I.E.viewability:-2;D.ua:Mozilla/5.0 (Linux; Android 7.0; SM-G920W8 Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36;D.G.city:Burnaby;D.G.zip:V5C;D.G.region:BC;D.G.E.country_code2:CA;`
    
    for _, match := range re.FindAllString(str, -1) {
        fmt.Println(match)
    }
}

Output:

I.E.viewability:-2
D.ua:Mozilla/5.0 (Linux; Android 7.0; SM-G920W8 Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36
D.G.city:Burnaby
D.G.zip:V5C
D.G.region:BC
D.G.E.country_code2:CA

huangapple
  • 本文由 发表于 2017年7月12日 21:35:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/45059243.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定