如何在单个/公共的正则表达式组中进行匹配或基于条件进行匹配?

huangapple go评论74阅读模式
英文:

How to match in a single/common Regex Group matching or based on a condition

问题

我想提取两个不同的测试字符串 /i/int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45/,640-1_999899,480-1_999899,960-1_999899,1280-1_999899,1920-1_999899,.mp4.csmil/master.m3u8?set-segment-duration=responsive

/i/int/2021/11/25/,live_20211125_215206_sendeton_640x360-50p-1200kbit,live_20211125_215206_sendeton_480x270-50p-700kbit,live_20211125_215206_sendeton_960x540-50p-1600kbit,live_20211125_215206_sendeton_1280x720-50p-3200kbit,live_20211125_215206_sendeton_1920x1080-50p-5000kbit,.mp4.csmil/master.m3u8

使用一个正则表达式,并在第一组中提取。

通过使用这个正则表达式 ^.[i,na,fm,d]+\/(.+([,\/])?(\/|.+=.+,\/).+\/[,](live.([^,]).).+_)?.+(640).*$,我可以使第二个字符串匹配到所需的结果 int/2021/11/25/,live_20211125_215206_

但是第一个字符串在第一组中没有匹配,缺少的预期测试字符串提取结果是 int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45

对此有任何指导意见将不胜感激。

谢谢!

英文:

I would like to extract two different test strings /i/int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45/,640-1_999899,480-1_999899,960-1_999899,1280-1_999899,1920-1_999899,.mp4.csmil/master.m3u8?set-segment-duration=responsive
and

/i/int/2021/11/25/,live_20211125_215206_sendeton_640x360-50p-1200kbit,live_20211125_215206_sendeton_480x270-50p-700kbit,live_20211125_215206_sendeton_960x540-50p-1600kbit,live_20211125_215206_sendeton_1280x720-50p-3200kbit,live_20211125_215206_sendeton_1920x1080-50p-5000kbit,.mp4.csmil/master.m3u8

with a single RegEx and in Group-1.

By using this RegEx ^.[i,na,fm,d]+\/(.+([,\/])?(\/|.+=.+,\/).+\/[,](live.([^,]).).+_)?.+(640).*$ I can get the second string to match the desired result int/2021/11/25/,live_20211125_215206_

but the first string does not match in Group-1 and the missing expected test string 1 extraction is int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45

Any pointers on this is appreciated.

Thanks!

答案1

得分: 2

如果你想要在第一组中获取两个值,你可以使用以下正则表达式:

^/(?:[id]|na|fm)/([^/\s]*/\d{4}/\d{2}/\d{2}/\S*?)(?:/,|[^_]+_)640(?:\D|$)

该模式匹配:

  • ^ 字符串的开头
  • / 匹配斜杠字面值
  • (?:[id]|na|fm) 匹配 idnafm 中的一个
  • / 匹配斜杠字面值
  • ( 捕获 第一组
    • [^/\s]*/ 匹配除了斜杠和空白字符之外的任意字符,然后匹配斜杠
    • \d{4}/\d{2}/\d{2}/ 匹配日期格式
    • \S*? 匹配可选的非空白字符,尽量少匹配
  • ) 结束第一组
  • (?:/,|[^_]+_) 匹配 /, 或者除了下划线之外的一个或多个字符,然后匹配下划线
  • 640 匹配字面值 640
  • (?:\D|$) 匹配非数字字符或者断言字符串结尾

可以在 regex demogo demo 中查看该正则表达式的演示。

英文:

If you want both values in group 1, you can use:

^/(?:[id]|na|fm)/([^/\s]*/\d{4}/\d{2}/\d{2}/\S*?)(?:/,|[^_]+_)640(?:\D|$)

The pattern matches:

  • ^ Start of string
  • / Match literally
  • (?:[id]|na|fm) Match one of i d na fm
  • / Match literally
  • ( Capture group 1
    • [^/\s]*/ Match any char except a / or a whitespace char, then match /
    • \d{4}/\d{2}/\d{2}/ Match a date like pattern
    • \S*? Match optional non whitespace chars, as few as possible
  • ) Close group 1
  • (?:/,|[^_]+_) Match either /, or 1+ chars other than _ and then match _
  • 640 Match literally
  • (?:\D|$) Match either a non digits or assert end of string

See a regex demo and a go demo.

答案2

得分: 0

我们无法知道您正在匹配的字符串的所有规则,但是对于提供的这两个示例字符串:

package main

import (
	"fmt"
	"regexp"
)

func main() {
	var re = regexp.MustCompile(`(?m)(\/i/int/\d{4}/\d{2}/\d{2}/.*)(?:\/,|_[\w_]+)640`)
	var str = `
/i/int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45/,640-1_999899,480-1_999899,960-1_999899,1280-1_999899,1920-1_999899,.mp4.csmil/master.m3u8?set-segment-duration=responsive
/i/int/2021/11/25/,live_20211125_215206_sendeton_640x360-50p-1200kbit,live_20211125_215206_sendeton_480x270-50p-700kbit,live_20211125_215206_sendeton_960x540-50p-1600kbit,live_20211125_215206_sendeton_1280x720-50p-3200kbit,live_20211125_215206_sendeton_1920x1080-50p-5000kbit,.mp4.csmil/master.m3u8`

	match := re.FindAllStringSubmatch(str, -1)

	for _, val := range match {
		fmt.Println(val[1])
	}
}

英文:

We can't know all the rules of how the strings your are matching are constructed, but for just these two example strings provided:

package main

import (
	"fmt"
	"regexp"
)

func main() {
	var re = regexp.MustCompile(`(?m)(\/i/int/\d{4}/\d{2}/\d{2}/.*)(?:\/,|_[\w_]+)640`)
	var str = `
/i/int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45/,640-1_999899,480-1_999899,960-1_999899,1280-1_999899,1920-1_999899,.mp4.csmil/master.m3u8?set-segment-duration=responsive
/i/int/2021/11/25/,live_20211125_215206_sendeton_640x360-50p-1200kbit,live_20211125_215206_sendeton_480x270-50p-700kbit,live_20211125_215206_sendeton_960x540-50p-1600kbit,live_20211125_215206_sendeton_1280x720-50p-3200kbit,live_20211125_215206_sendeton_1920x1080-50p-5000kbit,.mp4.csmil/master.m3u8`

	match := re.FindAllStringSubmatch(str, -1)

	for _, val := range match {
		fmt.Println(val[1])
	}
}

huangapple
  • 本文由 发表于 2022年6月29日 05:38:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/72793415.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定