有关回顾和前瞻的问题。

huangapple go评论65阅读模式
英文:

Trouble with lookbehind and lookahead

问题

I'm having a hard time trying to get this simple RegEx to work. I need to capture the message "Windows Event Logs Cleared" or any other message that might be in that position.

Expected output:

Windows Event Logs Cleared
Account Configured with Never-Expiring Password

英文:

I'm having a hard time trying to get this simple RegEx to work. I need to capture the message "Windows Event Logs Cleared" or any other message that might be in that position.

text = """2023-04-05 / 15:53:58 104 Windows Event Logs Cleared 21 low SRVR3 - j.smith 1
          2023-03-20 / 15:17:55 4738 Account Configured with Never-Expiring Password 47 medium DC02SRV - m.rossi 2"""
pattern = '(?<=\d{3}|\d{4}|-)(.*?)(?=\s\d{2}\s)'
regex = re.findall(pattern,text,re.MULTILINE))'

Current output:

Windows Event Logs Cleared

Expected output:

Windows Event Logs Cleared
Account Configured with Never-Expiring Password

Note:

  1. The date and time are always the same pattern
  2. The message starts just before a 3-digit number or a 4-digit number (in these examples 104 and 4738), but it could also be a -
  3. The message varies in length
  4. The message always ends just before the 2-digit number, which in these examples are 21 for the first and 47 for the second.

If anyone knows of a good, concise, gobbledygook-free tutorial, please lemme know.

答案1

得分: 2

以下是您要翻译的内容:

You could look a bit further behind, starting at the last colon that is part of the timestamp.

If doing this with the `regex` module (instead of `re`), then variable width look behind is possible, but with `re` you can instead split into multiple alternative fixed-width look-behind assertions in this way:

(?:(?<=:\d\d \d{3} )|(?<=:\d\d \d{4} )|(?<=:\d\d - ))(.*?)(?=\s\d{2}\s)

If using `regex`, then you can even use \K instead of a look behind assertion:

:\d\d (?:\d{3,4}|-) \K(.*?)(?=\s\d{2}\s)
英文:

You could look a bit further behind, starting at the last colon that is part of the timestamp.

If doing this with the regex module (instead of re), then variable width look behind is possible, but with re you can instead split into multiple alternative fixed-width look-behind assertions in this way:

(?:(?<=:\d\d \d{3} )|(?<=:\d\d \d{4} )|(?<=:\d\d - ))(.*?)(?=\s\d{2}\s)

If using regex, then you can even use \K instead of a look behind assertion:

:\d\d (?:\d{3,4}|-) \K(.*?)(?=\s\d{2}\s)

答案2

得分: 1

使用标准的Python re,回顾断言必须是固定长度的。由于消息可以由可变长度的数字前导,因此您不能对此使用回顾断言(第三方regex库克服了此限制)。

解决方法是使用一个捕获组来提取您想要的消息。

您的正则表达式的另一个问题是它没有匹配消息之前的日期和时间。

pattern = r'^\d{4}-\d{2}-\d{2} / \d{2}:\d{2}:\d{2} (?:-|\d{3,4}) (.*?) \d{2}'

当您使用这个正则表达式时,捕获组1将包含消息。

英文:

With standard Python re, lookbehinds have to be a fixed length. Since the message can be preceded by a variable-length number, you can't use a lookbehind for this (the third-party regex library overcomes this restriction).

The workaround is to use a capture group for the message that you want to extract.

The other problem with your regexp is that it doesn't match the date and time before the message.

pattern = r'^\d{4}-\d{2}-\d{2} / \d{2}:\d{2}:\d{2} (?:-|\d{3,4}) (.*?) \d{2}'

When you use this, capture group 1 will contain the message.

huangapple
  • 本文由 发表于 2023年4月7日 01:25:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/75952223.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定