英文:
golang regex get the string including the search character
问题
我正在从一个字符串中提取一个字符串片段(链接):
https://arteptweb-vh.akamaihd.net/i/am/ptweb/100000/100000/100095-000-A_0_VO-STE%5BANG%5D_AMM-PTWEB_XQ.1V7rLEYkPH.smil/master.m3u8
期望的输出应该是 100000/100000/100095-000-A_
我正在使用 Golang 风格的正则表达式 ^.*?(/[i,na,fm,d]([,/]?)(/am/ptweb/|.+=.+,))([^_]*).*?$,我只能得到第四个分组,输出为 100000/100000/100095-000-A
然而,我想要在 A 后面有一个下划线。
对此我有些困惑,希望能得到帮助。
英文:
I am extracting a piece of string from a string (link):
https://arteptweb-vh.akamaihd.net/i/am/ptweb/100000/100000/100095-000-A_0_VO-STE%5BANG%5D_AMM-PTWEB_XQ.1V7rLEYkPH.smil/master.m3u8
The desired output should be 100000/100000/100095-000-A_
I am using the Regex ^.*?(/[i,na,fm,d]([,/]?)(/am/ptweb/|.+=.+,))([^_]*).*?$ in Golang flavor and I can get only the group 4 with the folowing output 100000/100000/100095-000-A
However I want the underscore after A.
Bit stuck on this, any help on this is appreciated.
答案1
得分: 1
你可以使用以下正则表达式进行匹配:
(/(i|na|fm|d)(/am/ptweb/|.+=.+,))([^_]*_?)
详细解释如下:
(/(i|na|fm|d)(/am/ptweb/|.+=.+,))- 第一组:/- 斜杠字符(i|na|fm|d)- 第二组:i、na、fm或d(/am/ptweb/|.+=.+,)- 第三组:/am/ptweb/或尽可能多的字符(除了换行符),=,尽可能多的字符(除了换行符)和逗号字符
([^_]*_?)- 第四组:零个或多个非下划线字符,然后是可选的下划线字符。
英文:
You can use
(/(i|na|fm|d)(/am/ptweb/|.+=.+,))([^_]*_?)
See the regex demo.
Details:
(/(i|na|fm|d)(/am/ptweb/|.+=.+,))- Group 1:/- a/char(i|na|fm|d)- Group 2:i,na,fmord(/am/ptweb/|.+=.+,)- Group 3:/amp/ptweb/or one or more chars as many as possible (other than line break chars),=, one or more chars as many as possible (other than line break chars) and a,char
([^_]*_?)- Group 4: zero or more chars other than_and then an optional_.
答案2
得分: 1
你可以像这样匹配A后面的下划线:
^.*?(/(?:[id]|na|fm)([,/]?)(/am/ptweb/|.+=.+,))([^_]*_).*$
在正则表达式演示中查看。
关于你尝试的模式,有几点说明:
- 这个符号
[i,na,fm,d]应该是一个字符类,应该是一个分组(?:[id]|na|fm) - 在这个分组
([,/]?)中,你可以选择捕获逗号,或斜杠/,所以理论上它可以匹配包含/i//am/ptweb/的字符串 - 最后一部分
.*?$不需要是非贪婪的,因为它是模式的最后一部分 - 这部分
[^_]*也可以匹配空格和换行符
英文:
You can match the underscore after the A like:
^.*?(/(?:[id]|na|fm)([,/]?)(/am/ptweb/|.+=.+,))([^_]*_).*$
See a regex demo
A few notes about the pattern that you tried:
- This notation is a character class
[i,na,fm,d]which should be a grouping(?:[id]|na|fm) - In this group
([,/]?)you optionally capture either,or/so in theory it could match a string that has/i//am/ptweb/ - The last part
.*?$does not have to be non greedy as it is the last part of the pattern - This part
[^_]*can also match spaces and newlines
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论