匹配日志文件中的IPv4地址的正则表达式:

huangapple go评论54阅读模式
英文:

Regex to match IPv4 Addresses in log files

问题

我需要能够在日志文件中查找和替换像IP地址这样的敏感数据,以便我可以将它们发送给供应商进行技术支持。

问题是日志文件中还包含看起来像IP地址但带有额外数字的版本号。

到目前为止,我已经得到的正则表达式可以很好地匹配IP地址:

(((25[0-5]){1,3}|(2[0-4]|(1\d|[1-9]|)\d)){1,3}\.?){4}

问题是它也会匹配版本号,所以它也会匹配像1555.2655.3255.1594这样的内容。

我以为使用{1,3}会限制最多匹配3个数字,但它并不起作用。

我正在使用Powershell来解析文件,以下是我正在处理的格式示例:

test 127.1.1.1test test 10.0.0.1 test test 172.28.69.77test test 15.26.32.159 test test 15.26.32.1594test test 1.26.3255.1594test test 1555.2655.3255.1594test 255.255.255.192 256.255.255.0 999.999.999.999
英文:

I need to be able to find and replace sensitive data like IP addresses in log files so that I can send them to a vendor for technical support.

The trouble is that the log files also contain version numbers that look like ip addresses but with extra digits.

The regex I've got so far picks up IP addresses just fine:

(((25[0-5]){1,3}|(2[0-4]|(1\d|[1-9]|)\d)){1,3}\.?){4}

Trouble is that it also picks up things like version numbers so it also matches things like 1555.2655.3255.1594

I thought that using {1,3} would limit it to a max of 3 digits but it isn't working like that.

I'm using Powershell to parse the files and below is a mock up of the type of formatting I'm dealing with:

test 127.1.1.1test test 10.0.0.1 test test 172.28.69.77test test 15.26.32.159 test test 15.26.32.1594test test 1.26.3255.1594test test 1555.2655.3255.1594test 255.255.255.192 256.255.255.0 999.999.999.999

答案1

得分: 1

这可能适用于您。

(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})

英文:

This might work for you.

(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})

答案2

得分: 1

如果IP地址甚至可以紧邻文本中的字母,请使用负面回顾来设置边界。例如,(?<![\\d.]) 表示不在数字或点之前,然后是 (?![\\d.]) 表示不在数字或点之后。

(?<![\\d.])(?:(?:[1-9]?\\d|1\\d\\d|2[0-4]\\d|25[0-5])\\.){3}(?:[1-9]?\\d|1\\d\\d|2[0-4]\\d|25[0-5])(?![\\d.])

在regex101上查看演示


如果IP地址以空白字符分隔,则模式可以缩短为:

(?<![\\S])(?:(?:[1-9]?\\d|1\\d\\d|2[0-4]\\d|25[0-5])\\.?\\b){4}(?![\\S])

在regex101上查看另一个演示

这里缩短模式的方法是使用可选的点 \\.? 并通过词边界 \b 在每个 {4} 重复之后(甚至在末尾)强制它。在第一种情况下,这种技巧无法使用,因为例如在 1.2.3.4abc 中,IP之后没有词边界。


从IP模式中相关的部分(?:[1-9]?\\d|1\\d\\d|2[0-4]\\d|25[0-5])来匹配0-255,源自于正则表达式范围在线生成器(略有缩短)。Stack Overflow 上也有一个类似的解释。

英文:

If the IPs can even be adjacent to letters in some text, set boundaries with negative lookarounds. E.g. (?&lt;![\d.]) for not preceded by a digit or dot and after it (?![\d.]) for not followed by.

(?&lt;![\d.])(?:(?:[1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.){3}(?:[1-9]?\d|1\d\d|2[0-4]\d|25[0-5])(?![\d.])

See this demo at regex101


If the IPs are separated by whitespace, the pattern can be shortened to:

(?&lt;!\S)(?:(?:[1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.?\b){4}(?!\S)

Another demo at regex101

The method here to make the pattern shorter is the use of an optional dot \.? and force it by use of a word boundary \b after each of the {4} repititions (even at the end). In the first scenario this technique can't be used because because e.g. in 1.2.3.4abc there is no word boundary after the IP.


The relevant part from the IP-pattern (?:[1-9]?\d|1\d\d|2[0-4]\d|25[0-5]) to match 0-255 derives from a regex for range online generator (a bit shortened). There is one at Stack Overflow as well.

huangapple
  • 本文由 发表于 2023年4月13日 17:52:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76004052.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定