英文:
Regex ignore tokens that do not start with letter
问题
如何编写一个正则表达式,可以忽略不以字母开头的任何标记?它应该在Java中使用。
示例:it 's super cool
--> 正则表达式应匹配:[it, super, cool]
,并忽略['s]
。
英文:
how can I write a regex that ignores any token that does not start with a letter? it should be used in java.
example: it 's super cool
--> regex should match: [it, super, cool]
and ignore ['s]
.
答案1
得分: 0
你可以使用(?<!\\p{Punct})(\\p{L}+)
,这表示不在标点符号前面的字母。注意,(?<!
用于指定负向后查看。详细了解请查阅Pattern的文档。
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String str = "it's super cool";
Pattern pattern = Pattern.compile("(?<!\\p{Punct})(\\p{L}+)");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
}
输出结果:
it
super
cool
英文:
You can use (?<!\\p{Punct})(\\p{L}+)
which means letters not preceded by a punctuation mark. Note that (?<!
is used to specify a negative look behind. Check the documentation of Pattern to learn more about it.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String str = "it 's super cool";
Pattern pattern = Pattern.compile("(?<!\\p{Punct})(\\p{L}+)");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
}
Output:
it
super
cool
答案2
得分: 0
替代正则表达式:
"(?:^|\\s)([A-Za-z]+)"
上下文中的正则表达式:
public static void main(String[] args) {
String input = "it's super cool";
Matcher matcher = Pattern.compile("(?:^|\\s)([A-Za-z]+)").matcher(input);
while (matcher.find()) {
String result = matcher.group(1);
System.out.println(result);
}
}
输出:
it
super
cool
注意: 若要匹配任何语言(如印地语、德语、中文、英语等)中的字母字符,请使用以下正则表达式:
"(?:^|\\s)(\\p{L}+)"
有关Pattern
类以及Unicode脚本、块、类别和二进制属性的类的更多信息,请参见此处。
英文:
Alternative regex:
"(?:^|\\s)([A-Za-z]+)"
Regex in context:
public static void main(String[] args) {
String input = "it 's super cool";
Matcher matcher = Pattern.compile("(?:^|\\s)([A-Za-z]+)").matcher(input);
while (matcher.find()) {
String result = matcher.group(1);
System.out.println(result);
}
}
Output:
it
super
cool
Note: To match alphabetic characters, letters, in any language (e.g. Hindi, German, Chinese, English etc.), use the following regex instead:
"(?:^|\\s)(\\p{L}+)"
More about the class, Pattern
and the classes for Unicode scripts, blocks, categories and binary properties, can be found here.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论