无字符串顺序的模式匹配

huangapple go评论64阅读模式
英文:

Pattern match without a string order

问题

我想要匹配两个字符串,但顺序不应该影响结果。例如,下面的检查应该返回true而不是false。

final String line = "LIST \"car1\" \"car0\" RETURN (SPECIAL-USE STATUS)\n";
final String regex = ".*LIST.*\"car0\" \"car1\"\\) RETURN.*\\R";
System.out.println(line.matches(regex));

我期望字符串line中的值与正则表达式匹配,无论单词(car1和car0)的顺序如何。

英文:

I wanted to match two strings, but the order shouldn’t matter. E.g., the below check should give true instead of false.

final String line = "LIST \"\" (\"car1\" \"car0\") RETURN (SPECIAL-USE STATUS)\n";
final String regex = ".*LIST.*\\(\"car0\"\\ \"car1\"\\)\\ RETURN.*\\R";
System.out.println(line.matches(regex));

I am expecting that values in the string line should match with the regex, irrespective of the order of the words (car1 and car0).

答案1

得分: 0

你可以这样做。

final String regex = ".*LIST.*\\((?:\"car1\" \"car0\"|\"car0\" \"car1\")\\) RETURN.*\\R";

如果你不理解它的含义,你应该熟悉一下java.util.regex.Pattern的文档。这个链接基本上是你在Java中编写正则表达式时的圣经。

如果你还不清楚,这里是相同字符串的详细视图,但是对每个组件的含义进行了分解。

final String regex = ""
    + ".*"     //通配符--可以是任何内容或空
    + "LIST"   //字面字符串LIST(全大写)
    + ".*"     //另一个通配符
    + "\\("    //转义的左括号
    + "(?:"    //非捕获捕获组的开始
    + "\"\""     //转义的双引号
    + "car1"   //字面字符串car1
    + "\"\""     //另一个转义的双引号
    + " "      //一个空格--按一次空格键
    + "\"\""     //又一个转义的双引号
    + "car0"   //字面字符串car0
    + "\"\""     //迄今为止的第4个双引号
    + "|"      //这是一个非常特殊的符号--当你将这个符号放在任何类型的捕获组中时,你应该将其视为Java中的或运算符
    + "\"\""     //迄今为止的第5个双引号
    + "car0"   //字面字符串car0
    + "\"\""     //迄今为止的第6个双引号
    + " "      //另一个空格
    + "\"\""     //迄今为止的第7个双引号
    + "car1"   //字面字符串car1
    + "\"\""     //迄今为止的第8个双引号--也是最后一个双引号
    + ")"      //上面开始的非捕获捕获组的结束
    + "\\)"    //转义的右括号
    + " "      //又一个空格
    + "RETURN" //字面字符串RETURN(全大写)
    + ".*"     //又一个通配符
    + "\\R"    //这是一个换行符匹配器--它匹配所有的换行符号
    ;

需要注意的一些事情。

  1. 转义意味着你决定停止解释一个符号的特殊含义,而只是希望Java将其放入字符串中。要进行转义,你使用反斜杠符号(\)。转义可能会变得棘手,正如你所看到的。有时你需要2个反斜杠,有时你只需要1个。如果你需要帮助理解何时何地需要1个或2个(或更多),我建议你查看这个链接。

https://stackoverflow.com/questions/36092805/java-regular-expression-how-to-use-backslash

  1. 捕获组加上|符号允许你在正则表达式中使用OR子句。上面的正则表达式基本上是说,“匹配一个通配符,后面跟着LIST,再后面是另一个通配符,然后是一个开括号,然后是一个OR子句,其中必须满足以下情况之一。要么匹配字面字符串***"car1" "car0",要么匹配另一个字面字符串"car0" "car1"。在OR子句***之后,我们匹配一个闭括号,一个空格,字面字符串RETURN,另一个通配符,最后是一个换行符匹配器”。这也引出了我的下一个观点。

  2. 除了OR子句之外,这里的一切都是按顺序的。也就是说,必须先匹配一个,然后才能匹配下一个。OR子句使你能够在选项之间进行分支,但仅此而已。否则,一切都遵循按顺序进行的规则。

英文:

You could do this.

final String regex = ".*LIST.*\\((?:\"car1\" \"car0\"|\"car0\" \"car1\")\\) RETURN.*\\R";

And if you don't understand what it means, you should familiarize yourself with the documentation for java.util.regex.Pattern. That link is basically your bible when it comes to writing Regular Expressions in Java.

And if that is not clear enough for you, here is an exploded view of the exact same string, but with a breakdown of what each individual component means.

final String regex = ""
    + ".*"     //wildcard -- anything or nothing can go here.
    + "LIST"   //the literal string LIST in all-caps
    + ".*"     //another wildcard
    + "\\("    //an escaped opening parentheses
    + "(?:"    //the opener of a non-capturing capture group
    + "\""     //an escaped double quote
    + "car1"   //the literal string car1
    + "\""     //another escaped double quote
    + " "      //a single whitespace -- pressing the space bar once
    + "\""     //yet another escaped double quote
    + "car0"   //the literal string car0
    + "\""     //4th double quote thus far
    + "|"      //This is a very special symbol -- when you place this symbol
               //inside of any type of capture group, you should treat it
               //like an or operator in Java if statements
    + "\""     //5th double quote thus far
    + "car0"   //the literal string car0
    + "\""     //6th double quote thus far
    + " "      //another whitespace
    + "\""     //7th double quote thus far
    + "car1"   //the literal string car1
    + "\""     //8th double quote thus far -- it is also the last one
    + ")"      //the closer of the non-capturing capture group started above
    + "\\)"    //an escaped closing parentheses
    + " "      //yet another whitespace
    + "RETURN" //the literal string RETURN in all-caps
    + ".*"     //yet another wildcard
    + "\\R"    //This is a linebreak matcher -- it matches all new line symbols
    ;

Some things to note.

  1. Escaping means that you decide to stop interpreting a symbol for its special meaning, and you just want Java to put it into the String. To escape, you use the backslash symbol (\). Escaping can get tricky though, as you can see. Sometimes you need 2 backslashes, and sometimes you need 1. If you need help understanding when and where you need 1 or 2 (or worse yet, more), I would take a look at this link.

https://stackoverflow.com/questions/36092805/java-regular-expression-how-to-use-backslash

  1. A capture group plus a | symbol allows you to do OR clauses in your regex. The regex above basically says, "match a wildcard, followed by LIST, followed by another wildcard, followed by an opening parentheses, followed by an OR CLAUSE, where one of the following cases must be true. Either we match the literal string "car1" "car0" or the other literal string "car0" "car1". After the OR CLAUSE, we match a closing parentheses, a single whitespace, the literal string RETURN, another wildcard, and then finally, a new line matcher". Which leads to my next point.

  2. With the exception of the OR CLAUSE, everything here is in order. Meaning, one must be matched before the next one can be matched. The OR CLAUSE gives you the ability to branch between one of the options, but that's it. Otherwise, everything follows the rule of going in order.

答案2

得分: -1

正文翻译如下:

正则表达式变量可以写成:

final String regex = ".*LIST.*\\(\\"car[1|0]\\"\\ \\"car[1|0]\\"\\)\\ RETURN.*\\R";
英文:

The regex variable can be written as:

final String regex = ".*LIST.*\\(\"car[1|0]\"\\ \"car[1|0]\"\\)\\ RETURN.*\\R";

huangapple
  • 本文由 发表于 2023年7月27日 16:53:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76778081.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定