英文:
Trouble with lookbehind and lookahead
问题
I'm having a hard time trying to get this simple RegEx to work. I need to capture the message "Windows Event Logs Cleared" or any other message that might be in that position.
Expected output:
Windows Event Logs Cleared
Account Configured with Never-Expiring Password
英文:
I'm having a hard time trying to get this simple RegEx to work. I need to capture the message "Windows Event Logs Cleared" or any other message that might be in that position.
text = """2023-04-05 / 15:53:58 104 Windows Event Logs Cleared 21 low SRVR3 - j.smith 1
2023-03-20 / 15:17:55 4738 Account Configured with Never-Expiring Password 47 medium DC02SRV - m.rossi 2"""
pattern = '(?<=\d{3}|\d{4}|-)(.*?)(?=\s\d{2}\s)'
regex = re.findall(pattern,text,re.MULTILINE))'
Current output:
Windows Event Logs Cleared
Expected output:
Windows Event Logs Cleared
Account Configured with Never-Expiring Password
Note:
- The date and time are always the same pattern
- The message starts just before a 3-digit number or a 4-digit number (in these examples 104 and 4738), but it could also be a -
- The message varies in length
- The message always ends just before the 2-digit number, which in these examples are 21 for the first and 47 for the second.
If anyone knows of a good, concise, gobbledygook-free tutorial, please lemme know.
答案1
得分: 2
以下是您要翻译的内容:
You could look a bit further behind, starting at the last colon that is part of the timestamp.
If doing this with the `regex` module (instead of `re`), then variable width look behind is possible, but with `re` you can instead split into multiple alternative fixed-width look-behind assertions in this way:
(?:(?<=:\d\d \d{3} )|(?<=:\d\d \d{4} )|(?<=:\d\d - ))(.*?)(?=\s\d{2}\s)
If using `regex`, then you can even use \K instead of a look behind assertion:
:\d\d (?:\d{3,4}|-) \K(.*?)(?=\s\d{2}\s)
英文:
You could look a bit further behind, starting at the last colon that is part of the timestamp.
If doing this with the regex
module (instead of re
), then variable width look behind is possible, but with re
you can instead split into multiple alternative fixed-width look-behind assertions in this way:
(?:(?<=:\d\d \d{3} )|(?<=:\d\d \d{4} )|(?<=:\d\d - ))(.*?)(?=\s\d{2}\s)
If using regex
, then you can even use \K
instead of a look behind assertion:
:\d\d (?:\d{3,4}|-) \K(.*?)(?=\s\d{2}\s)
答案2
得分: 1
使用标准的Python re
,回顾断言必须是固定长度的。由于消息可以由可变长度的数字前导,因此您不能对此使用回顾断言(第三方regex
库克服了此限制)。
解决方法是使用一个捕获组来提取您想要的消息。
您的正则表达式的另一个问题是它没有匹配消息之前的日期和时间。
pattern = r'^\d{4}-\d{2}-\d{2} / \d{2}:\d{2}:\d{2} (?:-|\d{3,4}) (.*?) \d{2}'
当您使用这个正则表达式时,捕获组1将包含消息。
英文:
With standard Python re
, lookbehinds have to be a fixed length. Since the message can be preceded by a variable-length number, you can't use a lookbehind for this (the third-party regex
library overcomes this restriction).
The workaround is to use a capture group for the message that you want to extract.
The other problem with your regexp is that it doesn't match the date and time before the message.
pattern = r'^\d{4}-\d{2}-\d{2} / \d{2}:\d{2}:\d{2} (?:-|\d{3,4}) (.*?) \d{2}'
When you use this, capture group 1 will contain the message.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论