英文:
Capturing Value from Inconsistent RegEx Pattern
问题
Here is the code with the translated parts:
logs = ['event on SRVDC1.acme.loc created medium alert TS Gateway login failure.',
'event with source 10.10.13.1 by john.smith on SRVDC1.acme.loc created medium alert More Than 3 Failed Login Attempts Within 1 Hour .',
'authentication event with process lsass.exe, source 192.168.254.13:63000, by thomas on SRVDC1.acme.loc created medium alert Logon Failure - Unknown user or bad password.',
'iam event by ANONYMOUS LOGON on SRVDC2.acme.loc created medium alert Computer account added/changed/deleted..']
host_list = [re.search('(?<=on).*?(?=created)', log).group() for log in logs]
alert_list = [re.search('(?<=alert).*?(?=\.)', log).group() for log in logs]
source_ip_list = [' - ' if '\d+\.\d+\.\d+\.\d+\.' not in re.search('\d+\.\d+\.\d+\.\d+\.', log) else re.search('(?<=source).*?(?=\d+\.\d+\.\d+\.\d+\.)', log).group() for log in logs]
actor_list = [' - ' if 'by' not in re.search('by', log) else re.search('(?<=by).*?(?=on)', log).group() for log in logs]
print(host_list)
print(alert_list)
print(source_ip_list)
print(actor_list)
Please note that the code is already in English, and there's no need for translation. If you have any questions or need further assistance with the code, feel free to ask.
英文:
I have a list of server events that are very inconsistent.
logs = ['event on SRVDC1.acme.loc created medium alert TS Gateway login failure.',
'event with source 10.10.13.1 by john.smith on SRVDC1.acme.loc created medium alert More Than 3 Failed Login Attempts Within 1 Hour .',
'authentication event with process lsass.exe, source 192.168.254.13:63000, by thomas on SRVDC1.acme.loc created medium alert Logon Failure - Unknown user or bad password.',
'iam event by ANONYMOUS LOGON on SRVDC2.acme.loc created medium alert Computer account added/changed/deleted..']
There is a certain pattern to it though:
- An event (always present) - either with a modifier e.g., authentication, iam, etc. - or simply event.
- With a source IP (not always present)
- On a domain (always present)
- By an actor (not always present)
- Alert severity (always present) - e.g., medium alert, high alert, etc.
- Alert name (always present) at the very end of the string
I need to extract the alert name, domain name, ip, and actor. I've managed to extract the alert name and domain name, which are a constant, but I can't figure out how to extract the ip and actor since they are not always present. My idea is to replace them with a ' - ', but so far my attempts have been a failure.
host_list = [re.search('(?<=on).*?(?=created)',log).group() for log in logs]
alert_list = [re.search('(?<=alert).*?(?=\.)',log).group() for log in logs]
source_ip_list = [' - ' if '\d+\.\d+\.\d+\.\d+\.' not in re.search('\d+\.\d+\.\d+\.\d+\.',log) else re.search('(?<=source).*?(?=\d+\.\d+\.\d+\.\d+\.)',log).group() for log in logs]
actor_list = [' - ' if 'by' not in re.search('by',log) else re.search('(?<=by).*?(?=on)',log).group() for log in logs]
print(host_list)
print(alert_list)
print(source_ip_list)
print(actor_list)
Current output
source_ip_list = [' - ' if 'by' not in re.search('\d+\.\d+\.\d+\.\d+\.',log) else re.search('(?<=source).*?(?=\d+\.\d+\.\d+\.\d+\.)',log).group() for log in logs]
TypeError: argument of type 'NoneType' is not iterable
Expected Output
[' SRVDC1.acme.loc ', ' SRVDC1.acmes.loc ', ' event with process lsass.exe, source 192.168.254.13:63000, by thomas on SRVDC1.acme.loc ', ' SRVDC2.acme.loc ']
[' TS Gateway login failure', ' More Than 3 Failed Login Attempts Within 1 Hour ', ' Logon Failure - Unknown user or bad password', ' Computer account added/changed/deleted']
[' - ','10.10.13.1',' - ','192.168.254.13',' - ']
[' - ','by',' - ','by','by']
答案1
得分: 2
Sure, here's the translated code:
import re
logs = ['event on SRVDC1.acme.loc created medium alert TS Gateway login failure.',
'event with source 10.10.13.1 by john.smith on SRVDC1.acme.loc created medium alert More Than 3 Failed Login Attempts Within 1 Hour .',
'authentication event with process lsass.exe, source 192.168.254.13:63000, by thomas on SRVDC1.acme.loc created medium alert Logon Failure - Unknown user or bad password.',
'iam event by ANONYMOUS LOGON on SRVDC2.acme.loc created medium alert Computer account added/changed/deleted.']
# Extract host information
host_list = [re.search(r'(?<=on).*?(?=created)', log).group() for log in logs]
# Extract alert information
alert_list = [re.search(r'(?<=alert).*?(?=\.)', log).group() for log in logs]
# Extract source IP information
source_ip_list = [m.group() if (m := re.search(r'\d+\.\d+\.\d+\.\d+', log)) else ' - ' for log in logs]
# Extract actor information
actor_list = [m.group() if (m := re.search(r'(?<=by).*?(?=on)', log)) else ' - ' for log in logs]
print(host_list)
print(alert_list)
print(source_ip_list)
print(actor_list)
Prints:
[' SRVDC1.acme.loc ', ' SRVDC1.acme.loc ', ' event with process lsass.exe, source 192.168.254.13:63000, by thomas on SRVDC1.acme.loc ', ' SRVDC2.acme.loc ']
[' TS Gateway login failure', ' More Than 3 Failed Login Attempts Within 1 Hour ', ' Logon Failure - Unknown user or bad password', ' Computer account added/changed/deleted']
[' - ', '10.10.13.1', '192.168.254.13', ' - ']
[' - ', ' john.smith ', ' thomas ', ' ANONYMOUS LOGON ']
英文:
If I understand you correctly you can use :=
(walrus) operator to assign the search result to a variable and then check it if it isn't None
:
import re
logs = ['event on SRVDC1.acme.loc created medium alert TS Gateway login failure.',
'event with source 10.10.13.1 by john.smith on SRVDC1.acme.loc created medium alert More Than 3 Failed Login Attempts Within 1 Hour .',
'authentication event with process lsass.exe, source 192.168.254.13:63000, by thomas on SRVDC1.acme.loc created medium alert Logon Failure - Unknown user or bad password.',
'iam event by ANONYMOUS LOGON on SRVDC2.acme.loc created medium alert Computer account added/changed/deleted..']
host_list = [re.search(r'(?<=on).*?(?=created)',log).group() for log in logs]
alert_list = [re.search(r'(?<=alert).*?(?=\.)',log).group() for log in logs]
source_ip_list = [m.group() if (m:=re.search(r'\d+\.\d+\.\d+\.\d+', log)) else ' - ' for log in logs]
actor_list = [m.group() if (m:=re.search(r'(?<=by).*?(?=on)', log)) else ' - ' for log in logs]
print(host_list)
print(alert_list)
print(source_ip_list)
print(actor_list)
Prints:
[' SRVDC1.acme.loc ', ' SRVDC1.acme.loc ', ' event with process lsass.exe, source 192.168.254.13:63000, by thomas on SRVDC1.acme.loc ', ' SRVDC2.acme.loc ']
[' TS Gateway login failure', ' More Than 3 Failed Login Attempts Within 1 Hour ', ' Logon Failure - Unknown user or bad password', ' Computer account added/changed/deleted']
[' - ', '10.10.13.1', '192.168.254.13', ' - ']
[' - ', ' john.smith ', ' thomas ', ' ANONYMOUS LOGON ']
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论