2023年4月13日 17:52:03go评论70阅读模式

英文:

Regex to match IPv4 Addresses in log files

问题

我需要能够在日志文件中查找和替换像IP地址这样的敏感数据，以便我可以将它们发送给供应商进行技术支持。

问题是日志文件中还包含看起来像IP地址但带有额外数字的版本号。

到目前为止，我已经得到的正则表达式可以很好地匹配IP地址：

(((25[0-5]){1,3}|(2[0-4]|(1\d|[1-9]|)\d)){1,3}\.?){4}

问题是它也会匹配版本号，所以它也会匹配像1555.2655.3255.1594这样的内容。

我以为使用{1,3}会限制最多匹配3个数字，但它并不起作用。

我正在使用Powershell来解析文件，以下是我正在处理的格式示例：

test 127.1.1.1test test 10.0.0.1 test test 172.28.69.77test test 15.26.32.159 test test 15.26.32.1594test test 1.26.3255.1594test test 1555.2655.3255.1594test 255.255.255.192 256.255.255.0 999.999.999.999

英文:

I need to be able to find and replace sensitive data like IP addresses in log files so that I can send them to a vendor for technical support.

The trouble is that the log files also contain version numbers that look like ip addresses but with extra digits.

The regex I've got so far picks up IP addresses just fine:

(((25[0-5]){1,3}|(2[0-4]|(1\d|[1-9]|)\d)){1,3}\.?){4}

Trouble is that it also picks up things like version numbers so it also matches things like 1555.2655.3255.1594

I thought that using {1,3} would limit it to a max of 3 digits but it isn't working like that.

I'm using Powershell to parse the files and below is a mock up of the type of formatting I'm dealing with:

test 127.1.1.1test test 10.0.0.1 test test 172.28.69.77test test 15.26.32.159 test test 15.26.32.1594test test 1.26.3255.1594test test 1555.2655.3255.1594test 255.255.255.192 256.255.255.0 999.999.999.999

答案1

得分: 1

这可能适用于您。

(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})

英文:

This might work for you.

(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})

答案2

得分: 1

如果IP地址甚至可以紧邻文本中的字母，请使用负面回顾来设置边界。例如，(?<![\\d.]) 表示不在数字或点之前，然后是 (?![\\d.]) 表示不在数字或点之后。

(?<![\\d.])(?:(?:[1-9]?\\d|1\\d\\d|2[0-4]\\d|25[0-5])\\.){3}(?:[1-9]?\\d|1\\d\\d|2[0-4]\\d|25[0-5])(?![\\d.])

在regex101上查看演示

如果IP地址以空白字符分隔，则模式可以缩短为：

(?<![\\S])(?:(?:[1-9]?\\d|1\\d\\d|2[0-4]\\d|25[0-5])\\.?\\b){4}(?![\\S])

在regex101上查看另一个演示

这里缩短模式的方法是使用可选的点 \\.? 并通过词边界 \b 在每个 {4} 重复之后（甚至在末尾）强制它。在第一种情况下，这种技巧无法使用，因为例如在 1.2.3.4abc 中，IP之后没有词边界。

从IP模式中相关的部分(?:[1-9]?\\d|1\\d\\d|2[0-4]\\d|25[0-5])来匹配0-255，源自于正则表达式范围在线生成器（略有缩短）。Stack Overflow 上也有一个类似的解释。

英文:

If the IPs can even be adjacent to letters in some text, set boundaries with negative lookarounds. E.g. (?<![\d.]) for not preceded by a digit or dot and after it (?![\d.]) for not followed by.

(?&lt;![\d.])(?:(?:[1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.){3}(?:[1-9]?\d|1\d\d|2[0-4]\d|25[0-5])(?![\d.])

See this demo at regex101

If the IPs are separated by whitespace, the pattern can be shortened to:

(?&lt;!\S)(?:(?:[1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.?\b){4}(?!\S)

Another demo at regex101

The method here to make the pattern shorter is the use of an optional dot \.? and force it by use of a word boundary \b after each of the {4} repititions (even at the end). In the first scenario this technique can't be used because because e.g. in 1.2.3.4abc there is no word boundary after the IP.

The relevant part from the IP-pattern (?:[1-9]?\d|1\d\d|2[0-4]\d|25[0-5]) to match 0-255 derives from a regex for range online generator (a bit shortened). There is one at Stack Overflow as well.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

匹配日志文件中的IPv4地址的正则表达式：

问题

答案1

答案2

正则表达式：匹配包含至少3个数字或特定符号的8个字符单词

javac Foo.java && java Foo 在 Windows 的 VSCode 终端中不起作用。

这些正则表达式对可以简化成一个吗？

使用正则表达式在包含转义引号的 Jira during 子句中进行搜索。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论