golang regex get the string including the search character

huangapple go评论81阅读模式
英文:

golang regex get the string including the search character

问题

我正在从一个字符串中提取一个字符串片段(链接):

https://arteptweb-vh.akamaihd.net/i/am/ptweb/100000/100000/100095-000-A_0_VO-STE%5BANG%5D_AMM-PTWEB_XQ.1V7rLEYkPH.smil/master.m3u8

期望的输出应该是 100000/100000/100095-000-A_

我正在使用 Golang 风格的正则表达式 ^.*?(/[i,na,fm,d]([,/]?)(/am/ptweb/|.+=.+,))([^_]*).*?$,我只能得到第四个分组,输出为 100000/100000/100095-000-A

然而,我想要在 A 后面有一个下划线。

对此我有些困惑,希望能得到帮助。

英文:

I am extracting a piece of string from a string (link):

https://arteptweb-vh.akamaihd.net/i/am/ptweb/100000/100000/100095-000-A_0_VO-STE%5BANG%5D_AMM-PTWEB_XQ.1V7rLEYkPH.smil/master.m3u8 

The desired output should be 100000/100000/100095-000-A_

I am using the Regex ^.*?(/[i,na,fm,d]([,/]?)(/am/ptweb/|.+=.+,))([^_]*).*?$ in Golang flavor and I can get only the group 4 with the folowing output 100000/100000/100095-000-A

However I want the underscore after A.

Bit stuck on this, any help on this is appreciated.

答案1

得分: 1

你可以使用以下正则表达式进行匹配:

(/(i|na|fm|d)(/am/ptweb/|.+=.+,))([^_]*_?)

详细解释如下:

  • (/(i|na|fm|d)(/am/ptweb/|.+=.+,)) - 第一组:
    • / - 斜杠字符
    • (i|na|fm|d) - 第二组:inafmd
    • (/am/ptweb/|.+=.+,) - 第三组:/am/ptweb/或尽可能多的字符(除了换行符),=,尽可能多的字符(除了换行符)和逗号字符
  • ([^_]*_?) - 第四组:零个或多个非下划线字符,然后是可选的下划线字符。
英文:

You can use

(/(i|na|fm|d)(/am/ptweb/|.+=.+,))([^_]*_?)

See the regex demo.

Details:

  • (/(i|na|fm|d)(/am/ptweb/|.+=.+,)) - Group 1:
    • / - a / char
    • (i|na|fm|d) - Group 2: i, na, fm or d
    • (/am/ptweb/|.+=.+,) - Group 3: /amp/ptweb/ or one or more chars as many as possible (other than line break chars), =, one or more chars as many as possible (other than line break chars) and a , char
  • ([^_]*_?) - Group 4: zero or more chars other than _ and then an optional _.

答案2

得分: 1

你可以像这样匹配A后面的下划线:

^.*?(/(?:[id]|na|fm)([,/]?)(/am/ptweb/|.+=.+,))([^_]*_).*$

正则表达式演示中查看。

关于你尝试的模式,有几点说明:

  • 这个符号[i,na,fm,d]应该是一个字符类,应该是一个分组 (?:[id]|na|fm)
  • 在这个分组 ([,/]?) 中,你可以选择捕获逗号,或斜杠/,所以理论上它可以匹配包含/i//am/ptweb/的字符串
  • 最后一部分 .*?$ 不需要是非贪婪的,因为它是模式的最后一部分
  • 这部分 [^_]* 也可以匹配空格和换行符
英文:

You can match the underscore after the A like:

^.*?(/(?:[id]|na|fm)([,/]?)(/am/ptweb/|.+=.+,))([^_]*_).*$

See a regex demo

A few notes about the pattern that you tried:

  • This notation is a character class [i,na,fm,d] which should be a grouping (?:[id]|na|fm)
  • In this group ([,/]?) you optionally capture either , or / so in theory it could match a string that has /i//am/ptweb/
  • The last part .*?$ does not have to be non greedy as it is the last part of the pattern
  • This part [^_]* can also match spaces and newlines

huangapple
  • 本文由 发表于 2022年6月3日 20:59:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/72489981.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定