从不一致的正则表达式模式中捕获值

huangapple go评论52阅读模式
英文:

Capturing Value from Inconsistent RegEx Pattern

问题

Here is the code with the translated parts:

logs = ['event on SRVDC1.acme.loc created medium alert TS Gateway login failure.',
        'event with source 10.10.13.1 by john.smith on SRVDC1.acme.loc created medium alert More Than 3 Failed Login Attempts Within 1 Hour .',
        'authentication event with process lsass.exe, source 192.168.254.13:63000, by thomas on SRVDC1.acme.loc created medium alert Logon Failure - Unknown user or bad password.',
        'iam event by ANONYMOUS LOGON on SRVDC2.acme.loc created medium alert Computer account added/changed/deleted..']

host_list = [re.search('(?<=on).*?(?=created)', log).group() for log in logs]
alert_list = [re.search('(?<=alert).*?(?=\.)', log).group() for log in logs]
source_ip_list = [' - ' if '\d+\.\d+\.\d+\.\d+\.' not in re.search('\d+\.\d+\.\d+\.\d+\.', log) else re.search('(?<=source).*?(?=\d+\.\d+\.\d+\.\d+\.)', log).group() for log in logs]
actor_list = [' - ' if 'by' not in re.search('by', log) else re.search('(?<=by).*?(?=on)', log).group() for log in logs]

print(host_list)
print(alert_list)
print(source_ip_list)
print(actor_list)

Please note that the code is already in English, and there's no need for translation. If you have any questions or need further assistance with the code, feel free to ask.

英文:

I have a list of server events that are very inconsistent.

logs = [&#39;event on SRVDC1.acme.loc created medium alert TS Gateway login failure.&#39;,
        &#39;event with source 10.10.13.1 by john.smith on SRVDC1.acme.loc created medium alert More Than 3 Failed Login Attempts Within 1 Hour .&#39;,
        &#39;authentication event with process lsass.exe, source 192.168.254.13:63000, by thomas on SRVDC1.acme.loc created medium alert Logon Failure - Unknown user or bad password.&#39;,
        &#39;iam event by ANONYMOUS LOGON on SRVDC2.acme.loc created medium alert Computer account added/changed/deleted..&#39;]

There is a certain pattern to it though:

  1. An event (always present) - either with a modifier e.g., authentication, iam, etc. - or simply event.
  2. With a source IP (not always present)
  3. On a domain (always present)
  4. By an actor (not always present)
  5. Alert severity (always present) - e.g., medium alert, high alert, etc.
  6. Alert name (always present) at the very end of the string

I need to extract the alert name, domain name, ip, and actor. I've managed to extract the alert name and domain name, which are a constant, but I can't figure out how to extract the ip and actor since they are not always present. My idea is to replace them with a ' - ', but so far my attempts have been a failure.

host_list = [re.search(&#39;(?&lt;=on).*?(?=created)&#39;,log).group() for log in logs]
alert_list = [re.search(&#39;(?&lt;=alert).*?(?=\.)&#39;,log).group() for log in logs]
source_ip_list = [&#39; - &#39; if &#39;\d+\.\d+\.\d+\.\d+\.&#39; not in re.search(&#39;\d+\.\d+\.\d+\.\d+\.&#39;,log) else re.search(&#39;(?&lt;=source).*?(?=\d+\.\d+\.\d+\.\d+\.)&#39;,log).group() for log in logs]
actor_list = [&#39; - &#39; if &#39;by&#39; not in re.search(&#39;by&#39;,log) else re.search(&#39;(?&lt;=by).*?(?=on)&#39;,log).group() for log in logs]

print(host_list)
print(alert_list)
print(source_ip_list)
print(actor_list)

Current output

    source_ip_list = [&#39; - &#39; if &#39;by&#39; not in re.search(&#39;\d+\.\d+\.\d+\.\d+\.&#39;,log) else re.search(&#39;(?&lt;=source).*?(?=\d+\.\d+\.\d+\.\d+\.)&#39;,log).group() for log in logs]
TypeError: argument of type &#39;NoneType&#39; is not iterable

Expected Output

[&#39; SRVDC1.acme.loc &#39;, &#39; SRVDC1.acmes.loc &#39;, &#39; event with process lsass.exe, source 192.168.254.13:63000, by thomas on SRVDC1.acme.loc &#39;, &#39; SRVDC2.acme.loc &#39;]
[&#39; TS Gateway login failure&#39;, &#39; More Than 3 Failed Login Attempts Within 1 Hour &#39;, &#39; Logon Failure - Unknown user or bad password&#39;, &#39; Computer account added/changed/deleted&#39;]
[&#39; - &#39;,&#39;10.10.13.1&#39;,&#39; - &#39;,&#39;192.168.254.13&#39;,&#39; - &#39;]
[&#39; - &#39;,&#39;by&#39;,&#39; - &#39;,&#39;by&#39;,&#39;by&#39;]

答案1

得分: 2

Sure, here's the translated code:

import re

logs = ['event on SRVDC1.acme.loc created medium alert TS Gateway login failure.',
        'event with source 10.10.13.1 by john.smith on SRVDC1.acme.loc created medium alert More Than 3 Failed Login Attempts Within 1 Hour .',
        'authentication event with process lsass.exe, source 192.168.254.13:63000, by thomas on SRVDC1.acme.loc created medium alert Logon Failure - Unknown user or bad password.',
        'iam event by ANONYMOUS LOGON on SRVDC2.acme.loc created medium alert Computer account added/changed/deleted.']

# Extract host information
host_list = [re.search(r'(?<=on).*?(?=created)', log).group() for log in logs]

# Extract alert information
alert_list = [re.search(r'(?<=alert).*?(?=\.)', log).group() for log in logs]

# Extract source IP information
source_ip_list = [m.group() if (m := re.search(r'\d+\.\d+\.\d+\.\d+', log)) else ' - ' for log in logs]

# Extract actor information
actor_list = [m.group() if (m := re.search(r'(?<=by).*?(?=on)', log)) else ' - ' for log in logs]

print(host_list)
print(alert_list)
print(source_ip_list)
print(actor_list)

Prints:

[' SRVDC1.acme.loc ', ' SRVDC1.acme.loc ', ' event with process lsass.exe, source 192.168.254.13:63000, by thomas on SRVDC1.acme.loc ', ' SRVDC2.acme.loc ']
[' TS Gateway login failure', ' More Than 3 Failed Login Attempts Within 1 Hour ', ' Logon Failure - Unknown user or bad password', ' Computer account added/changed/deleted']
[' - ', '10.10.13.1', '192.168.254.13', ' - ']
[' - ', ' john.smith ', ' thomas ', ' ANONYMOUS LOGON ']
英文:

If I understand you correctly you can use := (walrus) operator to assign the search result to a variable and then check it if it isn't None:

import re

logs = [&#39;event on SRVDC1.acme.loc created medium alert TS Gateway login failure.&#39;,
        &#39;event with source 10.10.13.1 by john.smith on SRVDC1.acme.loc created medium alert More Than 3 Failed Login Attempts Within 1 Hour .&#39;,
        &#39;authentication event with process lsass.exe, source 192.168.254.13:63000, by thomas on SRVDC1.acme.loc created medium alert Logon Failure - Unknown user or bad password.&#39;,
        &#39;iam event by ANONYMOUS LOGON on SRVDC2.acme.loc created medium alert Computer account added/changed/deleted..&#39;]

host_list = [re.search(r&#39;(?&lt;=on).*?(?=created)&#39;,log).group() for log in logs]
alert_list = [re.search(r&#39;(?&lt;=alert).*?(?=\.)&#39;,log).group() for log in logs]
source_ip_list = [m.group() if (m:=re.search(r&#39;\d+\.\d+\.\d+\.\d+&#39;, log)) else &#39; - &#39; for log in logs]
actor_list = [m.group() if (m:=re.search(r&#39;(?&lt;=by).*?(?=on)&#39;, log)) else &#39; - &#39; for log in logs]

print(host_list)
print(alert_list)
print(source_ip_list)
print(actor_list)

Prints:

[&#39; SRVDC1.acme.loc &#39;, &#39; SRVDC1.acme.loc &#39;, &#39; event with process lsass.exe, source 192.168.254.13:63000, by thomas on SRVDC1.acme.loc &#39;, &#39; SRVDC2.acme.loc &#39;]
[&#39; TS Gateway login failure&#39;, &#39; More Than 3 Failed Login Attempts Within 1 Hour &#39;, &#39; Logon Failure - Unknown user or bad password&#39;, &#39; Computer account added/changed/deleted&#39;]
[&#39; - &#39;, &#39;10.10.13.1&#39;, &#39;192.168.254.13&#39;, &#39; - &#39;]
[&#39; - &#39;, &#39; john.smith &#39;, &#39; thomas &#39;, &#39; ANONYMOUS LOGON &#39;]

huangapple
  • 本文由 发表于 2023年5月23日 01:17:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/76308551.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定