正则表达式忽略不以字母开头的标记

huangapple go评论103阅读模式
英文:

Regex ignore tokens that do not start with letter

问题

如何编写一个正则表达式,可以忽略不以字母开头的任何标记?它应该在Java中使用。

示例:it 's super cool --> 正则表达式应匹配:[it, super, cool],并忽略['s]

英文:

how can I write a regex that ignores any token that does not start with a letter? it should be used in java.

example: it 's super cool --> regex should match: [it, super, cool] and ignore ['s].

答案1

得分: 0

你可以使用(?<!\\p{Punct})(\\p{L}+),这表示不在标点符号前面的字母。注意,(?<!用于指定负向后查看。详细了解请查阅Pattern的文档

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
    public static void main(String[] args) {
        String str = "it's super cool";
        Pattern pattern = Pattern.compile("(?<!\\p{Punct})(\\p{L}+)");
        Matcher matcher = pattern.matcher(str);
        while (matcher.find()) {
            System.out.println(matcher.group());
        }
    }
}

输出结果:

it
super
cool
英文:

You can use (?&lt;!\\p{Punct})(\\p{L}+) which means letters not preceded by a punctuation mark. Note that (?&lt;! is used to specify a negative look behind. Check the documentation of Pattern to learn more about it.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
	public static void main(String[] args) {
		String str = &quot;it &#39;s super cool&quot;;
		Pattern pattern = Pattern.compile(&quot;(?&lt;!\\p{Punct})(\\p{L}+)&quot;);
		Matcher matcher = pattern.matcher(str);
		while (matcher.find()) {
			System.out.println(matcher.group());
		}
	}
}

Output:

it
super
cool

答案2

得分: 0

替代正则表达式:

"(?:^|\\s)([A-Za-z]+)"

上下文中的正则表达式:

public static void main(String[] args) {
    String input = "it's super cool";

    Matcher matcher = Pattern.compile("(?:^|\\s)([A-Za-z]+)").matcher(input);

    while (matcher.find()) {
        String result = matcher.group(1);
        System.out.println(result);
    }
}

输出:

it
super
cool

注意: 若要匹配任何语言(如印地语、德语、中文、英语等)中的字母字符,请使用以下正则表达式:

"(?:^|\\s)(\\p{L}+)"

有关Pattern类以及Unicode脚本、块、类别和二进制属性的类的更多信息,请参见此处

英文:

Alternative regex:

&quot;(?:^|\\s)([A-Za-z]+)&quot;

Regex in context:

public static void main(String[] args) {
    String input = &quot;it &#39;s super cool&quot;;

    Matcher matcher = Pattern.compile(&quot;(?:^|\\s)([A-Za-z]+)&quot;).matcher(input);

    while (matcher.find()) {
        String result = matcher.group(1);
        System.out.println(result);
    }
}

Output:

it
super
cool

Note: To match alphabetic characters, letters, in any language (e.g. Hindi, German, Chinese, English etc.), use the following regex instead:

&quot;(?:^|\\s)(\\p{L}+)&quot;

More about the class, Pattern and the classes for Unicode scripts, blocks, categories and binary properties, can be found here.

huangapple
  • 本文由 发表于 2020年9月27日 21:23:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/64088919.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定