正则表达式忽略不以字母开头的标记

huangapple go评论132阅读模式
英文:

Regex ignore tokens that do not start with letter

问题

如何编写一个正则表达式,可以忽略不以字母开头的任何标记?它应该在Java中使用。

示例:it 's super cool --> 正则表达式应匹配:[it, super, cool],并忽略['s]

英文:

how can I write a regex that ignores any token that does not start with a letter? it should be used in java.

example: it 's super cool --> regex should match: [it, super, cool] and ignore ['s].

答案1

得分: 0

你可以使用(?<!\\p{Punct})(\\p{L}+),这表示不在标点符号前面的字母。注意,(?<!用于指定负向后查看。详细了解请查阅Pattern的文档

  1. import java.util.regex.Matcher;
  2. import java.util.regex.Pattern;
  3. public class Main {
  4. public static void main(String[] args) {
  5. String str = "it's super cool";
  6. Pattern pattern = Pattern.compile("(?<!\\p{Punct})(\\p{L}+)");
  7. Matcher matcher = pattern.matcher(str);
  8. while (matcher.find()) {
  9. System.out.println(matcher.group());
  10. }
  11. }
  12. }

输出结果:

  1. it
  2. super
  3. cool
英文:

You can use (?&lt;!\\p{Punct})(\\p{L}+) which means letters not preceded by a punctuation mark. Note that (?&lt;! is used to specify a negative look behind. Check the documentation of Pattern to learn more about it.

  1. import java.util.regex.Matcher;
  2. import java.util.regex.Pattern;
  3. public class Main {
  4. public static void main(String[] args) {
  5. String str = &quot;it &#39;s super cool&quot;;
  6. Pattern pattern = Pattern.compile(&quot;(?&lt;!\\p{Punct})(\\p{L}+)&quot;);
  7. Matcher matcher = pattern.matcher(str);
  8. while (matcher.find()) {
  9. System.out.println(matcher.group());
  10. }
  11. }
  12. }

Output:

  1. it
  2. super
  3. cool

答案2

得分: 0

替代正则表达式:

  1. "(?:^|\\s)([A-Za-z]+)"

上下文中的正则表达式:

  1. public static void main(String[] args) {
  2. String input = "it's super cool";
  3. Matcher matcher = Pattern.compile("(?:^|\\s)([A-Za-z]+)").matcher(input);
  4. while (matcher.find()) {
  5. String result = matcher.group(1);
  6. System.out.println(result);
  7. }
  8. }

输出:

  1. it
  2. super
  3. cool

注意: 若要匹配任何语言(如印地语、德语、中文、英语等)中的字母字符,请使用以下正则表达式:

  1. "(?:^|\\s)(\\p{L}+)"

有关Pattern类以及Unicode脚本、块、类别和二进制属性的类的更多信息,请参见此处

英文:

Alternative regex:

  1. &quot;(?:^|\\s)([A-Za-z]+)&quot;

Regex in context:

  1. public static void main(String[] args) {
  2. String input = &quot;it &#39;s super cool&quot;;
  3. Matcher matcher = Pattern.compile(&quot;(?:^|\\s)([A-Za-z]+)&quot;).matcher(input);
  4. while (matcher.find()) {
  5. String result = matcher.group(1);
  6. System.out.println(result);
  7. }
  8. }

Output:

  1. it
  2. super
  3. cool

Note: To match alphabetic characters, letters, in any language (e.g. Hindi, German, Chinese, English etc.), use the following regex instead:

  1. &quot;(?:^|\\s)(\\p{L}+)&quot;

More about the class, Pattern and the classes for Unicode scripts, blocks, categories and binary properties, can be found here.

huangapple
  • 本文由 发表于 2020年9月27日 21:23:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/64088919.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定