How to write a REGEX search that will search through blocks of multiline patterns and only match on a block that contains a specified string?

huangapple go评论49阅读模式
英文:

How to write a REGEX search that will search through blocks of multiline patterns and only match on a block that contains a specified string?

问题

我正在努力编写一个正则表达式模式,以在同一文件中搜索多个YARA规则。我已经想出了一个可以从多行中的每个YARA规则的开头到结尾匹配的模式。现在我想要匹配整个YARA规则,以及每个YARA规则,但仅当它包含字符串"BANANAS"时。

我现在遇到的问题是,我的正则表达式从一个YARA规则的开头一直匹配到包含字符串"BANANAS"的YARA规则的结尾,但它还抓取了开始和结束点之间不包含"BANANAS"的所有YARA规则。我漏掉了什么,以便只捕获包含指定字符串的规则?

这是我目前使用的正则表达式模式:

^rule\s[\s\S]*?^\}$
^rule\s[\s\S]*?(?=BANANAS)[\s\S]*?^\}$

第一个模式从开头到结束匹配每个单独的YARA规则。
第二个模式包含前瞻,试图仅在包含指定字符串的情况下匹配每个YARA规则。

为了澄清,我希望避免使用任何内置的多行匹配函数。这就是为什么我使用[\s\S]*而不是.*

我正在使用上面的正则表达式模式来匹配下面的文本示例。我指定的字符串"BANANAS"位于下面的YARA规则中的<description = "foo">字段中。

失败结果的图片

rule RULENAME
{
    meta:
        author = "abcdef"
        last_update = "abcdef"
        description = "TURKEY"
        hash = "abcdef" //dumped
    strings:
        $mz = "MZ"
		$low0 = "malware" ascii wide
		$low1 = "hello world" ascii wide
		$low2 = "sus" wide
		$low3 = "keyLogger" wide
		$low4 = "bot" wide
		$low5 = "usb" wide
    condition:
        $mz at 0 and ((3 of ($low*))
}
rule RULENAME
{
    meta:
        author = "abcdef"
        last_update = "abcdef"
        description = "BANANAS"
        hash = "abcdef" //dumped
    strings:
        $mz = "MZ"
		$low0 = "malware" ascii wide
		$low1 = "hello world" ascii wide
		$low2 = "sus" wide
		$low3 = "keyLogger" wide
		$low4 = "bot" wide
		$low5 = "usb" wide
    condition:
        $mz at 0 and ((3 of ($low*))
}
rule RULENAME
{
    meta:
        author = "abcdef"
        last_update = "abcdef"
        description = "CHICKEN"
        hash = "abcdef" //dumped
    strings:
        $mz = "MZ"
		$low0 = "malware" ascii wide
		$low1 = "hello world" ascii wide
		$low2 = "sus" wide
		$low3 = "keyLogger" wide
		$low4 = "bot" wide
		$low5 = "usb" wide
    condition:
        $mz at 0 and ((3 of ($low*))
}
英文:

I'm working to write a regex pattern that will search through multiple YARA rules within the same file. The pattern I've come up with already matches each YARA rule individually from beginning to end across multiple lines. Now I want to match the entire YARA rule, and each one individually, but only if it contains the string "BANANAS" somewhere within the rule.

The problem I'm now having is that my regex matches from the beginning of a YARA rule all the way to the end of the YARA rule that does contain the string "BANANAS", BUT it also grabs every YARA rule in between the start and end points that DO NOT contain "BANANAS". What am I missing to only grab the rules that contain my specified string?

These are the current regex patterns I'm using:

^rule\s[\s\S]*?^\}$
^rule\s[\s\S]*?(?=BANANAS)[\s\S]*?^\}$

The first pattern matches each individual YARA rule from beginning to end.
The second pattern contains the lookahead and is attempting to match each YARA rule only if it contains the specified string.

To clarify, I want to avoid using any built in app functions for multiline matching. Which is why I'm using [\s\S]* instead of .*

I'm using the above regex pattern to match on the text below as an example. The string "BANANAS" that I'm specifying is located in the <description = "foo"> field within the YARA rules below.

Picture of Failed results

rule RULENAME
{
    meta:
        author = "abcdef"
        last_update = "abcdef"
        description = "TURKEY"
        hash = "abcdef" //dumped
    strings:
        $mz = "MZ"
		$low0 = "malware" ascii wide
		$low1 = "hello world" ascii wide
		$low2 = "sus" wide
		$low3 = "keyLogger" wide
		$low4 = "bot" wide
		$low5 = "usb" wide
    condition:
        $mz at 0 and ((3 of ($low*))
}
rule RULENAME
{
    meta:
        author = "abcdef"
        last_update = "abcdef"
        description = "BANANAS"
        hash = "abcdef" //dumped
    strings:
        $mz = "MZ"
		$low0 = "malware" ascii wide
		$low1 = "hello world" ascii wide
		$low2 = "sus" wide
		$low3 = "keyLogger" wide
		$low4 = "bot" wide
		$low5 = "usb" wide
    condition:
        $mz at 0 and ((3 of ($low*))
}
rule RULENAME
{
    meta:
        author = "abcdef"
        last_update = "abcdef"
        description = "CHICKEN"
        hash = "abcdef" //dumped
    strings:
        $mz = "MZ"
		$low0 = "malware" ascii wide
		$low1 = "hello world" ascii wide
		$low2 = "sus" wide
		$low3 = "keyLogger" wide
		$low4 = "bot" wide
		$low5 = "usb" wide
    condition:
        $mz at 0 and ((3 of ($low*))
}

答案1

得分: 1

^rule\s[^}]BANANAS[^}]?^}$

我认为这可能有效:

我无法复制您的截图,但似乎它匹配了两个规则,因为单个匹配可以跨越多个规则,所以它从第一个规则开始,然后匹配到包含BANANAS的规则的末尾。如果您将BANANAS作为底部规则,您可能会看到它匹配您示例中的所有3个规则。我用**[^}]替换了[\s\S]**以防止这种情况发生。

英文:

I think this could work:

^rule\s[^}]*BANANAS[^}]*?^}$

I didn't manage to reproduce your screenshot, but that looks like it's matching two rules because a single match can span multiple rules, so it started from the first rule and then matched up to the end of the rule with BANANAS in it. If you would have BANANAS as the bottom rule you would probably see it match all of the 3 rules in your example. I replaced [\s\S] with [^}] to prevent this.

huangapple
  • 本文由 发表于 2023年7月13日 22:58:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76680831.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定