regex pattern works in online tool, parses in NSRegularExpression, but fails to match anything

huangapple go评论59阅读模式
英文:

regex pattern works in online tool, parses in NSRegularExpression, but fails to match anything

问题

以下是您要翻译的内容:

I am trying to match roman numerals from test strings like:

Series Name.disk_V.Episode_XI.Episode_name.avi
Series Name.Season V.Episode XI.Part XXV.Episode_name.avi

and a real-world example in which the XIII should not match:

XIII: The Series season II episode V.mp4

Following the logic in this fantastic thread and many experiments in an online regex debugger I came up with this:

(?<=d|dvd|disc|disk|s|se|season|e|ep|episode)[\s.-]\KM{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})(?=[\s.-])

The last example returns two matches, "II" and "V", ignoring the XIII in the name part. Yay!

So then I tried it in a Swift playground:

let file = "Series Name.disk_V.Episode_XI.Episode_name.avi"
let p = #"(?<=d|dvd|disc|disk|s|se|season|e|ep|episode)[\s.-]\KM{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})(?=[\s.-])"#
let r = try NSRegularExpression(pattern: p, options: [.caseInsensitive])
let nsString = file as NSString
let results = r.matches(in: suggestion, options: [], range: NSMakeRange(0, nsString.length))

The pattern parses without error but returns no matches. I found that it works if I remove the \K, although that leaves the leading separator in the match. According to this thread, Obj-C (which I assume means NSRegex) supports \K, so I'm not sure why this fails.

There are a number of similar-sounding threads here on SO, but they invariably have to do with patterns that fail to parse, mostly due to escaping. This is not the case here, it parses fine and I can see the pattern is correct (ie, no double-slashes) if you print(r). It just doesn't match.

Can anyone offer some insight or an alternative regex that does not use \K?

英文:

I am trying to match roman numerals from test strings like:

Series Name.disk_V.Episode_XI.Episode_name.avi
Series Name.Season V.Episode XI.Part XXV.Episode_name.avi

and a real-world example in which the XIII should not match:

XIII: The Series season II episode V.mp4

Following the logic in this fantastic thread and many experiments in an online regex debugger I came up with this:

(?&lt;=d|dvd|disc|disk|s|se|season|e|ep|episode)[\s._-]\KM{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})(?=[\s._-])

The last example returns two matches, "II" and "V", ignoring the XIII in the name part. Yay!

So then I tried it in a Swift playground:

let file = &quot;Series Name.disk_V.Episode_XI.Episode_name.avi&quot;
let p = #&quot;(?&lt;=d|dvd|disc|disk|s|se|season|e|ep|episode)[\s._-]\KM{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})(?=[\s._-])&quot;#
let r = try NSRegularExpression(pattern: p, options: [.caseInsensitive])
let nsString = file as NSString
let results  = r.matches(in: suggestion, options: [], range: NSMakeRange(0, nsString.length))

The pattern parses without error but returns no matches. I found that it works if I remove the \K, although that leaves the leading separator in the match. According to this thread, Obj-C (which I assume means NSRegex) supports \K, so I'm not sure why this fails.

There are a number of similar-sounding threads here on SO, but they invariably have to do with patterns that fail to parse, mostly due to escaping. This is not the case here, it parses fine and I can see the pattern is correct (ie, no double-slashes) if you print(r). It just doesn't match.

Can anyone offer some insight or an alternative regex that does not use \K?

答案1

得分: 1

TheFourthBird的想法是解决方案。我通过移除\K并将整个罗马数字部分设为命名组来修改模式:

(?<=d|dvd|disc|disk|s|se|season|e|ep|episode)[\s._-](?&roman>M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3}))(?=[\s._-])

解析它时,首先按上面的一切进行,然后查找匹配的项目,像这样:

for result in results {
    let nameRange = result.range(withName: "roman")
    print(nsString.substring(with: nameRange))
}

输出:

V
XI

Bingo!

英文:

TheFourthBird's idea is the solution. I modified the pattern by removing the \K and making the entire roman section a named group:

(?&lt;=d|dvd|disc|disk|s|se|season|e|ep|episode)[\s._-](?&lt;roman&gt;M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3}))(?=[\s._-])

To parse it, everything as above to start but then look for the matching items like this:

for result in results {
    let nameRange = result.range(withName: &quot;roman&quot;)
    print(nsString.substring(with: nameRange))
}

Output:

V
XI

Bingo!

huangapple
  • 本文由 发表于 2023年2月10日 03:32:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/75403567.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定