正则表达式 – 在两个指定的单词之间找到特定长度的单词

huangapple go评论60阅读模式
英文:

Regex - Find words of certain length between two specified words

问题

好的,以下是您要翻译的内容:

你好,

我最近开始使用正则表达式(在Java中),在这方面遇到了一个问题,我需要一些帮助/指导。

我想要查找位于两个单词“jack”和“james”之间的特定长度的单词(在这种情况下,长度为4个字符或更多)。

以下是我用来测试正则表达式的文本。

昨天,詹姆斯与杰克一起玩(第1行)
杰克昨天和詹姆斯一起玩(第2行)
杰克和詹姆斯是最好的朋友(第3行)
詹姆斯将帮助杰克做作业(第4行)
昨天,詹姆斯去了杰克的房子(第5行)

我希望实现以下效果

玩着(第1行)
玩着(第2行)
没有匹配(第3行)
将帮助(第4行)
过来(第5行)

我想出了以下正则表达式

(?<=james)(.*)(?=jack)|(?<=jack)(.*)(?=james)

但是这个特定的正则表达式返回两个单词之间的所有字符。我还尝试了以下不成功的正则表达式(在挫折开始占据主导地位之前,还尝试了许多其他方法)。此外,我省略了

(?<=james)(\\b\w{4,}\\b)(?=jack)|(?<=jack)(\\b\w{4,}\\b)(?=james)

非常感谢您提供任何指导。

真诚地

英文:

Good day,

I have recently started working with regex (in java) and have stumbled onto a problem I require some assistance/guidance.

I am looking to find words of a certain length (in this case 4 characters long or more) between two words jack and james.

The following is the text I am using to test my regex against.

james was playing with jack yesterday (line 1)
jack was playing with james yersterday (line 2)
jack and james are best friends (line 3)
james will be helping jack with his homework (line 4)
yesterday, james come over jack&#39;s house (line 5)

What I hope to achieve is the following

playing with(line 1)
playing with(line 2)
no matches(line 3)
will helping(line 4)
come over(line 5)

I have come up with the following

(?&lt;=james)(.*)(?=jack)|(?&lt;=jack)(.*)(?=james)

But this particular regex, returns all characters between the two words. I also tried the following unsuccessfully (as well as many others before frustration started taking over). Also, I omitted

(?&lt;=james)(\\b\w{4,}\\b)(?=jack)|(?&lt;=jack)(\\b\w{4,}\\b)(?=james)

Any guidance would be greatly appreciated.

Sincerely

答案1

得分: 1

这似乎按要求工作。

  • (?&lt;=) 为两个名称执行正向后瞻
  • (?=) 为两个名称执行正向前瞻
  • \\w{4,} 一个长度超过三个字符的单词
  • .* 用于匹配两个零宽断言之间的字符。
String[] lines =  {"昨天,詹姆斯和杰克在玩耍(第一行)",
    "杰克昨天和詹姆斯在玩耍(第二行)",
    "杰克和詹姆斯是最好的朋友(第三行)",
    "詹姆斯将帮助杰克做家庭作业(第四行)",
    "昨天,詹姆斯去了杰克家(第五行)"};

Pattern p = Pattern.compile("(?&lt;=(?:jack|james).*)(\\w{4,})(?=.*(?:jack|james))");

for (String line : lines) {
    Matcher m = p.matcher(line);
    // 用于打印新行的标志。
    boolean flag = false;
    while(m.find()) {
        flag = true;
        System.out.print(m.group(1) + " " );
    }
    if (flag) {
        System.out.println();
    }
}

输出结果为:

玩耍 
玩耍 
将帮助 
过来 
英文:

This seems to work as required.

  • (?&lt;=) positive look behind for the two names
  • (?=) positve look ahead for the two names.
  • \\w{4,} a word of more than three characters
  • .* used to gobble up the chars between the two zero width assertions.
String[] lines =  {&quot;james was playing with jack yesterday (line 1)&quot;,
	&quot;jack was playing with james yersterday (line 2)&quot;,
	&quot;jack and james are best friends (line 3)&quot;,
	&quot;james will be helping jack with his homework (line 4)&quot;,
	&quot;yesterday, james come over jack&#39;s house (line 5)&quot;};

Pattern p = Pattern.compile(&quot;(?&lt;=(?:jack|james).*)(\\w{4,})(?=.*(?:jack|james))&quot;);

for (String line : lines) {
	  Matcher m = p.matcher(line);
      // a flag for printing a new line.	 
      boolean flag = false;
	  while(m.find()) {
		  flag = true;
		  System.out.print(m.group(1) + &quot; &quot; );
	  }
	  if (flag) {
		  System.out.println();
	  }
}

Prints

playing with 
playing with 
will helping 
come over 


</details>



# 答案2
**得分**: 0

```regex
使用

    (?:\G(?<!^)|(jack|james))(?:\W+\w{1,3})*\W+(\w{4,})(?=(?:(?!).)*?(?!)(jack|james))

见 [证明][1]。你需要组2中保存的值。

**解释**

                             解释
    --------------------------------------------------------------------------------
      (?:                      组合但不捕获:
    --------------------------------------------------------------------------------
        \G                       上一次 m//g 结束的地方
    --------------------------------------------------------------------------------
        (?<!                     向前查找以查看是否不是:
    --------------------------------------------------------------------------------
          ^                        字符串的开头
    --------------------------------------------------------------------------------
        )                        向前查找结束
    --------------------------------------------------------------------------------
       |                        或者
    --------------------------------------------------------------------------------
        (                        组合并捕获到 :
    --------------------------------------------------------------------------------
          jack                     'jack'
    --------------------------------------------------------------------------------
         |                        或者
    --------------------------------------------------------------------------------
          james                    'james'
    --------------------------------------------------------------------------------
        )                         结束
    --------------------------------------------------------------------------------
      )                        组合结束
    --------------------------------------------------------------------------------
      (?:                      组合但不捕获(0次或多次
                               (尽可能多次匹配)):
    --------------------------------------------------------------------------------
        \W+                      非单词字符(除了 a-z、A-Z、0-
                                 9、_)(1次或多次
                                 (尽可能多次匹配))
    --------------------------------------------------------------------------------
        \w{1,3}                  单词字符(a-z、A-Z、0-9、_)
                                 (1到3次
                                 (尽可能多次匹配))
    --------------------------------------------------------------------------------
      )*                       组合结束
    --------------------------------------------------------------------------------
      \W+                      非单词字符(除了 a-z、A-Z、0-
                               9、_)(1次或多次
                               (尽可能多次匹配))
    --------------------------------------------------------------------------------
      (                        组合并捕获到 :
    --------------------------------------------------------------------------------
        \w{4,}                   单词字符(a-z、A-Z、0-9、_)
                                 (至少4次
                                 (尽可能多次匹配))
    --------------------------------------------------------------------------------
      )                         结束
    --------------------------------------------------------------------------------
      (?=                      向前查找以查看是否有:
    --------------------------------------------------------------------------------
        (?:                      组合但不捕获(0次或多次
                                 (尽可能少次匹配)):
    --------------------------------------------------------------------------------
          (?!                      向前查找以查看是否没有:
    --------------------------------------------------------------------------------
                                   被  捕获的内容
    --------------------------------------------------------------------------------
          )                        向前否定结束
    --------------------------------------------------------------------------------
          .                        任何字符(除了换行符)
    --------------------------------------------------------------------------------
        )*?                      组合结束
    --------------------------------------------------------------------------------
        (?!                      向前查找以查看是否没有:
    --------------------------------------------------------------------------------
                                 被  捕获的内容
    --------------------------------------------------------------------------------
        )                        向前否定结束
    --------------------------------------------------------------------------------
        (                        组合并捕获到 :
    --------------------------------------------------------------------------------
          jack                     'jack'
    --------------------------------------------------------------------------------
         |                        或者
    --------------------------------------------------------------------------------
          james                    'james'
    --------------------------------------------------------------------------------
        )                         结束
    --------------------------------------------------------------------------------
      )                        向前查找结束

[Java][2]:

    import java.util.*;
    import java.util.regex.*;
    import java.lang.*;
    import java.io.*;
    
    class Ideone
    {
    	public static void main (String[] args) throws java.lang.Exception
    	{
    		String regex = "(?:\\G(?<!^)|(jack|james))(?:\\W+\\w{1,3})*\\W+(\\w{4,})(?=(?:(?!\).)*?(?!\)(jack|james))";
    		String string = "james was playing with jack yesterday";
    		Pattern pattern = Pattern.compile(regex);
    		Matcher matcher = pattern.matcher(string);
    		List<String> results = new ArrayList<>();
    		while (matcher.find()) {
    		    results.add(matcher.group(2));
    		}
    		System.out.println(String.join(" ", results));
    	}
    }

结果:`playing with`

  [1]: https://regex101.com/r/jURqnQ/1
  [2]: https://tio.run/##ZVHLTsMwEDw3X7H0ZJdiqcCJvsQBcQEJiQMHAtKSmNRpYlu20xBBvr04TsrTl/XOzsyu1znu8ERpLvN0u9@LUivjIPcgq5wo2GQe/cMMz/jb30qBMvuLCdUhUVKgtXBnVGawjN6jka5eCpGAdeh82CmRQolCArl3Rsjs8QnQZJaC2xhV2x/@V28J104oGY28zainQ5gHljAm64s4vibrxdEz/SA5JtuPHEtuKQ2Vh@M4rt9n07OWTrqMdOn5tPXVpSeQ9VEczyijk/Vw/WUxnn93tH3wLUMRarSgC2w6sBZuA50QGm4dNyk2QXqHzmcS9BCXMCAsUaUWBSfhHbTj3qJLNtz4rfRxeVCxASH9BIF8I6xb9IOt/C5sVTjrFZLXcGkMNqG@IoFbb3wjIIMLexUyJZRCt0zwZ1AzTNMvTmZUpckpDfq220Hjn1UyVTmmfVNXyOHfWK6EJGMYTw9GQdRG7X7/CQ
英文:

Use

(?:\G(?&lt;!^)|(jack|james))(?:\W+\w{1,3})*\W+(\w{4,})(?=(?:(?!).)*?(?!)(jack|james))

See proof. You need the values that are held inside Group 2.

Explanation

                         EXPLANATION
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    \G                       where the last m//g left off
--------------------------------------------------------------------------------
    (?&lt;!                     look behind to see if there is not:
--------------------------------------------------------------------------------
      ^                        the beginning of the string
--------------------------------------------------------------------------------
    )                        end of look-behind
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    (                        group and capture to :
--------------------------------------------------------------------------------
      jack                     &#39;jack&#39;
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      james                    &#39;james&#39;
--------------------------------------------------------------------------------
    )                        end of 
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    \W+                      non-word characters (all but a-z, A-Z, 0-
                             9, _) (1 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
    \w{1,3}                  word characters (a-z, A-Z, 0-9, _)
                             (between 1 and 3 times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  )*                       end of grouping
--------------------------------------------------------------------------------
  \W+                      non-word characters (all but a-z, A-Z, 0-
                           9, _) (1 or more times (matching the most
                           amount possible))
--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    \w{4,}                   word characters (a-z, A-Z, 0-9, _) (at
                             least 4 times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of 
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the least amount
                             possible)):
--------------------------------------------------------------------------------
      (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
                               what was matched by capture 
--------------------------------------------------------------------------------
      )                        end of look-ahead
--------------------------------------------------------------------------------
      .                        any character except \n
--------------------------------------------------------------------------------
    )*?                      end of grouping
--------------------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
                             what was matched by capture 
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
    (                        group and capture to :
--------------------------------------------------------------------------------
      jack                     &#39;jack&#39;
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      james                    &#39;james&#39;
--------------------------------------------------------------------------------
    )                        end of 
--------------------------------------------------------------------------------
  )                        end of look-ahead

Java:

import java.util.*;
import java.util.regex.*;
import java.lang.*;
import java.io.*;

class Ideone
{
	public static void main (String[] args) throws java.lang.Exception
	{
		String regex = &quot;(?:\\G(?&lt;!^)|(jack|james))(?:\\W+\\w{1,3})*\\W+(\\w{4,})(?=(?:(?!\).)*?(?!\)(jack|james))&quot;;
		String string = &quot;james was playing with jack yesterday&quot;;
		Pattern pattern = Pattern.compile(regex);
		Matcher matcher = pattern.matcher(string);
		List&lt;String&gt; results = new ArrayList&lt;&gt;();
		while (matcher.find()) {
		    results.add(matcher.group(2));
		}
		System.out.println(String.join(&quot; &quot;, results));
	}
}

Result: playing with

huangapple
  • 本文由 发表于 2020年9月17日 00:17:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/63924167.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定