正则表达式:为什么我找不到匹配项

huangapple go评论83阅读模式
英文:

Regular Expression: Why do I get no match found

问题

我正在尝试解析一个包含许多部分的文档。

每个部分都以:[]:开头,后面跟着空格,然后是1个或多个字符(任意字符),然后是一个:,再后面是一个空格和一个或多个字符(任意字符)。

以下是一个示例:

:[]: Abet1, Abetted34: 在第1-CB-45节的表格中查找用法:或者在以PARTIE-DU-CORPS开头的相关部分中查找更多信息。
:[]: Ou est-ce que tu a mal: Tu as mal aux jambes: 在第145-TT-LA-TETE节找到用法。

每个部分中感兴趣的标记是从 :[]: 到第一个 : 出现的位置的内容。例如,在第一个部分中,我只想提取出::[]: Abet1, Abetted34:


起初,我使用了以下模式来从文档的每个部分提取标记,但这会提取出从部分中第一个 : 出现的位置到最后一个 : 出现的位置的所有内容:

"\\B:\\[\\]:.*:\\B"

如果我将模式调整为以下内容,以从 :[]: 提取标记到第一个 : 出现的位置,我就无法匹配任何内容:

"\\B:\\[\\]:\\s*.:{1}"

请问如何编写正则表达式来提取我想要的内容?

英文:

I am trying to parse a document that consists of many sections.

Each section begins with :[]: followed by blank space, followed by 1 or more characters (any characters), followed by a : a blank space and one or more characters (any characters).

Here's an example:

:[]: Abet1, Abetted34: Find the usage in table under section 1-CB-45: Or more info from the related section starting with PARTIE-DU-CORPS.
:[]: Ou est-ce que tu a mal: Tu as mal aux jambes: Find usage in section 145-TT-LA-TETE.

The token of interest from each section is everything from :[]: to the first occurrence of :. For example, in the first section, I am only interested in extracting: :[]: Abet1, Abetted34:


At first, I used the following pattern finder to extract the token from each section of the document but this extracted everything from the first occurrence of : to the last occurrence of : in the section:

"\\B:\\[\\]:.*:\\B"

If I change the pattern finder to the following to extract the token from :[]: to the first occurrence of :, I get no match:

"\\B:\\[\\]:\\s*.:{1}"

How would the regular expression that extracts what I want look like?

答案1

得分: 3

这是你想要的吗?

正则表达式:为什么我找不到匹配项

查看更多:https://regex101.com/r/jOmnSb/2

或者

正则表达式:为什么我找不到匹配项

查看更多:https://regex101.com/r/jOmnSb/3

更新:

您可以在此处将正则表达式转换为Java正则表达式:https://www.regexplanet.com/advanced/java/index.html

英文:

This is what you want?

正则表达式:为什么我找不到匹配项

See more : https://regex101.com/r/jOmnSb/2

Or

正则表达式:为什么我找不到匹配项

See more : https://regex101.com/r/jOmnSb/3

UPDATE :

You can convert regex to Java regex here : https://www.regexplanet.com/advanced/java/index.html

答案2

得分: 3

import java.util.regex.*; 
public class MatchTest {
    public static void main(String[] args) {
        Pattern pattern = Pattern.compile(":\\[\\]: [^:]+:", Pattern.CASE_INSENSITIVE);
        Matcher matcher =
            pattern.matcher(
                ":[]: Abet1, Abetted34: Find the usage in table under section 1-CB-45: Or more info from the related section starting with PARTIE-DU-CORPS.\n"
              + ":[]: Ou est-ce que tu a mal: Tu as mal aux jambes: Find usage in section 145-TT-LA-TETE."
            );
        while (matcher.find()) {
            System.out.println(matcher.group());
        }
    }
}
英文:

So you want to match a string against:

  1. :[]:_ (where _ is a space character)
  2. followed by one or more characters that are not a : (refer to this question)
  3. close the match with a : character

The regex for that would be:

:\[\]: [^:]+:

You have to escape \ characters when converting the regex pattern to Java. You could do something like:

import java.util.regex.*; 
public class MatchTest {
    public static void main(String[] args) {
        Pattern pattern = Pattern.compile(":\\[\\]: [^:]+:", Pattern.CASE_INSENSITIVE);
        Matcher matcher =
            pattern.matcher(
                ":[]: Abet1, Abetted34: Find the usage in table under section 1-CB-45: Or more info from the related section starting with PARTIE-DU-CORPS.\n"
              + ":[]: Ou est-ce que tu a mal: Tu as mal aux jambes: Find usage in section 145-TT-LA-TETE."
            );
        while (matcher.find()) {
            System.out.println(matcher.group());
        }
    }
}

huangapple
  • 本文由 发表于 2020年10月9日 23:18:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/64282824.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定