在处理扁平文件中的十六进制、IP和时间戳值时,可以使用正则表达式。

huangapple go评论65阅读模式
英文:

Regex to process hexadecimal, ip and timestamp values in a flat file

问题

输入字符串:

IDVal 4273E6D162ED2717A1CF4207A254004CD3F5307B
Posted 2022-12-28 07:35:55
Status 2022-12-28 08:10:11
Entry 94.62.86.22 2022-12-28 11:10:30
Entry 21.12.26.23 2022-12-28 13:10:30
Entry 113.132.26.203 2022-12-28 12:56:30
Entry 31.12.27.22 2022-12-28 12:35:30
IDVal 0D12D8E72DED99EE31BB0C57789352BED0CEEEFF
Posted 2022-12-28 07:30:55
Status 2022-12-28 06:10:11
Entry 51.102.52.36 2022-12-28 07:10:30
IDVal D947623B30C9D6E142E7D90FC7368B1A2A4F5045
Posted 2010-12-27 04:35:55
Status 2010-12-26 03:10:11
Entry 81.287.82.106 2022-12-28 05:10:20
Entry 11.12.147.2 2022-12-28 07:20:30
Entry 91.177.62.236 2022-12-27 07:10:30
Entry 78.102.152.89 2022-12-25 07:10:30

每个IDVal可以有多个Entry,只需要最近时间戳的Entry。
寻找一个在golang中处理列表并选择/捕获每个IDVal(十六进制)、IP(可以是IPV4/IPV6)和时间戳(格式为YYYY-MM-DD HI:MI:SS)的正则表达式。

所以对于上面的输入字符串,它应该捕获:

4273E6D162ED2717A1CF4207A254004CD3F5307B
21.12.26.23
2022-12-28 13:10:30
0D12D8E72DED99EE31BB0C57789352BED0CEEEFF
51.102.52.36
2022-12-28 07:10:30
D947623B30C9D6E142E7D90FC7368B1A2A4F5045
11.12.147.2
2022-12-28 07:20:30

我对正则表达式不熟悉,我能想到的只有:

`IDVal\s[0-9a-fA-F]+`g
和
`Entry\s[0-9]+.[0-9]+.[0-9]+.[0-9]+\s.[0-9]+\-.[0-9]+\-.[0-9]+\s.[0-9]+\:.[0-9]+\:.[0-9]`g

但它们都依赖于IDVal和Entry这两个词在列表中存在的假设,但它们可能存在也可能不存在。因此,它应该能够获取第一个十六进制值,然后查找紧随其后的第一个IPV4/IPV6和时间戳,然后跳过其他Entry(IPV4/IPV6),继续查找下一个十六进制值,依此类推。

另外,你可以假设我们在一个for循环中迭代输入字符串的每一行。

英文:

Input String :

IDVal 4273E6D162ED2717A1CF4207A254004CD3F5307B
Posted 2022-12-28 07:35:55
Status 2022-12-28 08:10:11
Entry 94.62.86.22 2022-12-28 11:10:30
Entry 21.12.26.23 2022-12-28 13:10:30
Entry 113.132.26.203 2022-12-28 12:56:30
Entry 31.12.27.22 2022-12-28 12:35:30
IDVal 0D12D8E72DED99EE31BB0C57789352BED0CEEEFF
Posted 2022-12-28 07:30:55
Status 2022-12-28 06:10:11
Entry 51.102.52.36 2022-12-28 07:10:30
IDVal D947623B30C9D6E142E7D90FC7368B1A2A4F5045
Posted 2010-12-27 04:35:55
Status 2010-12-26 03:10:11
Entry 81.287.82.106 2022-12-28 05:10:20
Entry 11.12.147.2 2022-12-28 07:20:30
Entry 91.177.62.236 2022-12-27 07:10:30
Entry 78.102.152.89 2022-12-25 07:10:30

each IDVal can have multiple Entry, out of them only the Entry which has the most recent timestamp is needed.
Looking for regex in golang which will process the list and select/capture only hexadecimal value for each IDVal(is hexadecimal), IP(can be IPV4/IPV6) and timestamp ( which is in YYYY-MM-DD HI:MI:SS format) .

So for the above Input string it should capture :

4273E6D162ED2717A1CF4207A254004CD3F5307B
21.12.26.23
2022-12-28 13:10:30
0D12D8E72DED99EE31BB0C57789352BED0CEEEFF
51.102.52.36
2022-12-28 07:10:30
D947623B30C9D6E142E7D90FC7368B1A2A4F5045
11.12.147.2
2022-12-28 07:20:30

I am new to regex the only thing I could think of was :

`IDVal\s[0-9a-fA-F]+`g
and 
`Entry\s[0-9]+.[0-9]+.[0-9]+.[0-9]+\s.[0-9]+\-.[0-9]+\-.[0-9]+\s.[0-9]+\:.[0-9]+\:.[0-9]`g

but they both are dependent on the assumption that IDVal and Entry words should exist in the list but it may or may not have them. Hence it should be able to Get first Hexadecimal value and then look for first IPV4/IPV6 and timestamp that follows and then skip other Entry(IPV4's/IPV6's) and look for next hexadecimal value and so on .

Also you can make the assumption that we are iterating each line of the Input String in a for loop.

答案1

得分: 2

这个答案是基于假设IDVal始终是40个字符长,并且数据始终按照相同的顺序排列。

你可以从命名捕获组idvalipdate中获取所需的数据。

示例

英文:

This answer is based on the assumption that IDVal is always 40 characters long and the data is always in the same order.

(?P<idval>\b[A-F\d]{40}\b)(.|\n)*?(?P<ip>\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b).?(?P<date>\b\d{4}(-\d{2}){2} (\d{2}:){2}\d{2}\b)

Example

You can get the desired data from the named capture groups idval, ip and date.

huangapple
  • 本文由 发表于 2022年4月7日 07:46:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/71774786.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定