2023年7月10日 20:03:05go评论55阅读模式

英文:

How can I make a regular expression which ignores repeated strings, but checks for future valid strings?

问题

String 1: 01 3E
String 2: 01 7E
String 3: 21 51 00 00 66 63 51
String 4: 22 00 00 00 00 37 41 31
String 5: 30

英文:

The Issue At Hand

I have a CAN log file which contains a series of messages in the following format. I've identified each string in the log file by naming it 'String n:' followed by the actual content of the file.

String 1: 01 3E 55 55 55 55 55 55
String 2: 01 7E 00 00 00 00 00 00
String 3: 21 51 00 00 66 63 51 00
String 4: 22 00 00 00 00 37 41 31
String 5: 30 00 00 55 55 55 55 55

There is more content on each log line, but this regex will only be run once I've extracted just this portion of each line from the original log file contents. I've provided a sample of a raw log line below, just in case that somehow helps anyone figure this out more easily.

Sample Line: 2023-07-07 05:07:48.896 Tx 7e0 01 3E 55 55 55 55 55 55

I'd like to make a regular expression which only returns back out pairs of characters before where I see all 00 for the remainder of a string, or 55 for the remainder of a string. I'm expecting to see results as follows for the 5 input strings, but I can't seem to build the correct regular expression to produce these results.

String 1: 01 3E
String 2: 01 7E
String 3: 21 51 00 00 66 63 51
String 4: 22 00 00 00 00 37 41 31
String 5: 30

Can someone help me build this regex correctly?

What I've Tried

I've tried using positive lookahead regular expression patterns, but no matter how I try and configure my positive lookaheads, I am struggling to get the right characters back. I'm always either dropping one pair of characters (the 3E in string 1, or the 7E in string 2), or I'm not getting matches at all (string 5 gives me back nothing). I've dropped the regex I've been messing with below along with an example of what it's not returning out.

Regular Expression: ([0-9A-F]{2,} (?!55|00))+
String 1: Returns 01
String 2: Returns 01
String 3: Returns 21 00 66 63 (No idea how to fix this issue)
String 4: Returns 00 37 41 (Again, no idea how to fix this issue)
String 5: Returns null (Why doesn't it even see the 30?)

答案1

得分: 2

你可以匹配任何字符，直到第一个空格出现，然后重复匹配 55 或 00 直到行末，使用以下正则表达式：

^.*?(?=(?: (?:00|55))*$)

详细说明：

^ - 字符串（或行）的开头
.*? - 任何零个或多个非换行字符，尽可能少地匹配
(?=(?: (?:00|55))*$) - 一个正向前瞻，匹配紧随其后的位置，紧随其后是零个或多个空格 + 00 或 55 直到字符串/行的末尾。

更新

要在较大的字符串中匹配这些文本，可以使用以下正则表达式：

(?<!\S)[a-fA-F0-9]{2}(?: [a-fA-F0-9]{2})*?(?=(?: (?:00|55))*$)

详细说明：

(?<!\S) - 左侧的空白边界
[a-fA-F0-9]{2} - 两个十六进制字符
(?: [a-fA-F0-9]{2})*? - 零个或多个，但尽可能少的出现，空格 + 两个十六进制字符
(?=(?: (?:00|55))*$) - 一个正向前瞻，匹配一个位置，紧随其后是零个或多个空格，然后是 00 或 55 直到字符串的末尾。

这对于从任何给定输入字符串中提取单个匹配项非常有效。

英文:

You can match any chars up to the first occurrence of spaces and then 55 or 00 repeated till the end of the line with

^.*?(?=(?: (?:00|55))*$)

See the regex demo.

Details:

^ - start of the string (or line)
.*? - any zero or more chars other than line break chars as few as possible
(?=(?: (?:00|55))*$) - a positive lookahead that matches a location that is immediately followed with zero or more repetitions of a space + 00 or 55 till the end of the string/line.

UPDATE

To match these texts inside larger strings, you can use

(?&lt;!\S)[a-fA-F0-9]{2}(?: [a-fA-F0-9]{2})*?(?=(?: (?:00|55))*$)

See the regex demo.

Details:

(?<!\S) - left-hand whitespace boundary
[a-fA-F0-9]{2} - two hex chars
(?: [a-fA-F0-9]{2})*? - zero or more, but as few as possible, occurrences of a space + two hex chars
(?=(?: (?:00|55))*$) - a positive lookahead that matches a position immediately followed with zero or more repetitions of a space and then either 00 or 55, till end of string.

This works for you as you extract a single match from any given input string.

答案2

得分: 1

以下是翻译好的部分：

"To preface, wouldn't you want the initial 00's from the last example?"
在开头，你是否想要最后一个示例中的初始_00_？

"That being said, there are a few ways to do this."
话虽如此，有几种方法可以做到这一点。

"You could remove the unwanted values from the line, using a find-and-replace."
您可以使用查找和替换来删除行中不需要的值。

"Or, you could use a capture group, to return the wanted values."
或者，您可以使用捕获组来返回所需的值。

"If you don't want the initial 00's, you can use the following patterns, respectively."
如果您不想要初始的_00_，您可以分别使用以下模式。

英文:

To preface, wouldn't you want the initial 00's from the last example?

30 00 00 55 55 55 55 55

30 00 00

That being said, there are a few ways to do this.

You could remove the unwanted values from the line, using a find-and-replace.

(?:(?: 00)+|(?: 55)+)$

Or, you could use a capture group, to return the wanted values.

(.+?)(?:(?:(?: 00)+|(?: 55)+)$|$)

If you don't want the initial 00's, you can use the following patterns, respectively.

(?: 00| 55)+$

(.+?)(?:(?:(?: 00| 55)+)$|$)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How can I make a regular expression which ignores repeated strings, but checks for future valid strings?

问题

答案1

答案2

将公式变量转换为变量名称，使用正则表达式操作。

使用正则表达式在文本中捕获大写字母的单词。

使用Go语言解析维基百科的信息框（Infobox）吗？

正则表达式在Go中不起作用

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论