英文:
How can I make a regular expression which ignores repeated strings, but checks for future valid strings?
问题
String 1: 01 3E
String 2: 01 7E
String 3: 21 51 00 00 66 63 51
String 4: 22 00 00 00 00 37 41 31
String 5: 30
英文:
The Issue At Hand
I have a CAN log file which contains a series of messages in the following format. I've identified each string in the log file by naming it 'String n:' followed by the actual content of the file.
String 1: 01 3E 55 55 55 55 55 55
String 2: 01 7E 00 00 00 00 00 00
String 3: 21 51 00 00 66 63 51 00
String 4: 22 00 00 00 00 37 41 31
String 5: 30 00 00 55 55 55 55 55
There is more content on each log line, but this regex will only be run once I've extracted just this portion of each line from the original log file contents. I've provided a sample of a raw log line below, just in case that somehow helps anyone figure this out more easily.
Sample Line: 2023-07-07 05:07:48.896 Tx 7e0 01 3E 55 55 55 55 55 55
I'd like to make a regular expression which only returns back out pairs of characters before where I see all 00
for the remainder of a string, or 55
for the remainder of a string. I'm expecting to see results as follows for the 5 input strings, but I can't seem to build the correct regular expression to produce these results.
String 1: 01 3E
String 2: 01 7E
String 3: 21 51 00 00 66 63 51
String 4: 22 00 00 00 00 37 41 31
String 5: 30
Can someone help me build this regex correctly?
What I've Tried
I've tried using positive lookahead regular expression patterns, but no matter how I try and configure my positive lookaheads, I am struggling to get the right characters back. I'm always either dropping one pair of characters (the 3E
in string 1, or the 7E
in string 2), or I'm not getting matches at all (string 5 gives me back nothing). I've dropped the regex I've been messing with below along with an example of what it's not returning out.
Regular Expression: ([0-9A-F]{2,} (?!55|00))+
String 1: Returns 01
String 2: Returns 01
String 3: Returns 21 00 66 63
(No idea how to fix this issue)
String 4: Returns 00 37 41
(Again, no idea how to fix this issue)
String 5: Returns null
(Why doesn't it even see the 30?)
答案1
得分: 2
你可以匹配任何字符,直到第一个空格出现,然后重复匹配 55
或 00
直到行末,使用以下正则表达式:
^.*?(?=(?: (?:00|55))*$)
详细说明:
^
- 字符串(或行)的开头.*?
- 任何零个或多个非换行字符,尽可能少地匹配(?=(?: (?:00|55))*$)
- 一个正向前瞻,匹配紧随其后的位置,紧随其后是零个或多个空格 +00
或55
直到字符串/行的末尾。
更新
要在较大的字符串中匹配这些文本,可以使用以下正则表达式:
(?<!\S)[a-fA-F0-9]{2}(?: [a-fA-F0-9]{2})*?(?=(?: (?:00|55))*$)
详细说明:
(?<!\S)
- 左侧的空白边界[a-fA-F0-9]{2}
- 两个十六进制字符(?: [a-fA-F0-9]{2})*?
- 零个或多个,但尽可能少的出现,空格 + 两个十六进制字符(?=(?: (?:00|55))*$)
- 一个正向前瞻,匹配一个位置,紧随其后是零个或多个空格,然后是00
或55
直到字符串的末尾。
这对于从任何给定输入字符串中提取单个匹配项非常有效。
英文:
You can match any chars up to the first occurrence of spaces and then 55
or 00
repeated till the end of the line with
^.*?(?=(?: (?:00|55))*$)
See the regex demo.
Details:
^
- start of the string (or line).*?
- any zero or more chars other than line break chars as few as possible(?=(?: (?:00|55))*$)
- a positive lookahead that matches a location that is immediately followed with zero or more repetitions of a space +00
or55
till the end of the string/line.
UPDATE
To match these texts inside larger strings, you can use
(?<!\S)[a-fA-F0-9]{2}(?: [a-fA-F0-9]{2})*?(?=(?: (?:00|55))*$)
See the regex demo.
Details:
(?<!\S)
- left-hand whitespace boundary[a-fA-F0-9]{2}
- two hex chars(?: [a-fA-F0-9]{2})*?
- zero or more, but as few as possible, occurrences of a space + two hex chars(?=(?: (?:00|55))*$)
- a positive lookahead that matches a position immediately followed with zero or more repetitions of a space and then either00
or55
, till end of string.
This works for you as you extract a single match from any given input string.
答案2
得分: 1
以下是翻译好的部分:
"To preface, wouldn't you want the initial 00's from the last example?"
在开头,你是否想要最后一个示例中的初始_00_?
"That being said, there are a few ways to do this."
话虽如此,有几种方法可以做到这一点。
"You could remove the unwanted values from the line, using a find-and-replace."
您可以使用查找和替换来删除行中不需要的值。
"Or, you could use a capture group, to return the wanted values."
或者,您可以使用捕获组来返回所需的值。
"If you don't want the initial 00's, you can use the following patterns, respectively."
如果您不想要初始的_00_,您可以分别使用以下模式。
英文:
To preface, wouldn't you want the initial 00's from the last example?
30 00 00 55 55 55 55 55
30 00 00
That being said, there are a few ways to do this.
You could remove the unwanted values from the line, using a find-and-replace.
(?:(?: 00)+|(?: 55)+)$
Or, you could use a capture group, to return the wanted values.
(.+?)(?:(?:(?: 00)+|(?: 55)+)$|$)
If you don't want the initial 00's, you can use the following patterns, respectively.
(?: 00| 55)+$
(.+?)(?:(?:(?: 00| 55)+)$|$)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论