Java正则表达式仅返回单个匹配。

huangapple go评论50阅读模式
英文:

Java regex returns only single match

问题

我有一个带有以下内容的文件:

~LayerData
type="waypointlist"
type="waypointlistend"
type="track" name="Track1" color=#695cbb
type="trackpoint" latitude="43.5032064" longitude="16.4266248"
type="trackpoint" latitude="43.5071074767561" longitude="16.48329290000057"
type="trackend"
~EndLayerData
~LayerData
type="waypointlist"
type="waypointlistend"
type="track" name="Track2" color=#000000
type="trackpoint" latitude="43.51037193515589" longitude="16.491883500895977"
type="trackpoint" latitude="43.521582832754135" longitude="16.473187288140295"
type="trackend"
~EndLayerData

我正在使用以下代码提取LayerDataEndLayerData之间的匹配项:

Pattern p = Pattern.compile("(~LayerData(.|\\n)*~EndLayerData)");
Matcher m = p.matcher(s);

结果是我获得了三个项目的m.group():前两个是相同的,包含整个文件。最后一个是"\n"。我预期分开获得Track1和Track2。

英文:

I have file with content:

~LayerData
type="waypointlist"
type="waypointlistend"
type="track" name="Track1" color=#695cbb
type="trackpoint" latitude="43.5032064" longitude="16.4266248"
type="trackpoint" latitude="43.5071074767561" longitude="16.48329290000057"
type="trackend"
~EndLayerData
~LayerData
type="waypointlist"
type="waypointlistend"
type="track" name="Track2" color=#000000
type="trackpoint" latitude="43.51037193515589" longitude="16.491883500895977"
type="trackpoint" latitude="43.521582832754135" longitude="16.473187288140295"
type="trackend"
~EndLayerData

I'm extracing LayerData -> EndLayerData matches using:

Pattern p = Pattern.compile("(~LayerData(.|\n)*~EndLayerData)");
Matcher m = p.matcher(s);

As a result I get m.group() with three items: first two are identical and contain the complete file. Last one is "\n". I expected to receive Track1 and Track2 separated.

答案1

得分: 1

你可以使用负向预查来匹配LayerData后面的所有行,这些行不能以LayerData或EndLayerData开头。

^~LayerData(?:\R(?!~(?:End)?LayerData).*)*\R~EndLayerData

解释

  • ^~LayerData 从字符串的开头匹配LayerData
  • (?: 非捕获组
    • \R(?!~(?:End)?LayerData) 匹配换行符,并断言紧接其后的内容不是EndLayerData或LayerData
    • .* 匹配行的其余部分
  • )* 关闭组并重复0次或多次,以匹配所有行
  • \R~EndLayerData 匹配换行符和EndLayerData

在Java中需要双重转义的反斜杠:

String regex = "^~LayerData(?:\\R(?!~(?:End)?LayerData).*)*\\R~EndLayerData";

正则表达式演示 | Java演示

示例代码:

String regex = "^~LayerData(?:\\R(?!~(?:End)?LayerData).*)*\\R~EndLayerData";
String string = "...";

Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
    System.out.println(matcher.group(0));
}
英文:

You could match LayerData followed by all lines that do not start with either LayerData or EndLayerData using a negative lookahead.

^~LayerData(?:\R(?!~(?:End)?LayerData).*)*\R~EndLayerData

Explanation

  • ^~LayerData Match LayerData from the start of the string
  • (?: Non capture group
    • \R(?!~(?:End)?LayerData) Match a newline, assert what is directly to the right is not EndLayerData or LayerData
    • .* Match the rest of the line
  • )* Close the group and repeat 0+ times to get all lines
  • \R~EndLayerData Match a newline and EndLayerData

In Java with double escaped backslashes:

String regex = "^~LayerData(?:\\R(?!~(?:End)?LayerData).*)*\\R~EndLayerData";

Regex demo | Java demo

Example code

String regex = "^~LayerData(?:\\R(?!~(?:End)?LayerData).*)*\\R~EndLayerData";
String string = "...";

Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
    System.out.println(matcher.group(0));
}

答案2

得分: 0

(~LayerData(.|\n)*?~EndLayerData) 的翻译是:

(~LayerData(.|\n)*?~EndLayerData)

英文:

Try this pattern

(~LayerData(.|\n)*?~EndLayerData)

答案3

得分: 0

Update:

String regex = "~LayerData(.|\\n)*?~EndLayerData";
Pattern pattern = Pattern.compile(regex); 
Matcher matcher = pattern.matcher(string); 
while (matcher.find()) { 
    System.out.println(matcher.group(0)); 
}

Earlier Answer:
你没有正确匹配到结果,因为你使用的正则表达式不正确。因为它匹配以"~LayerData"开头并以"~EndLayerData"结尾的任何内容,所以整个文件都被匹配了。使用regex101.com创建一个适当的正则表达式(有助于可视化),并使用它应该解决问题。

英文:

Update:
Use Code Generator under tools in regex101 to get language-specific regex.

String regex = "\\~LayerData(.|\\n)*?\\~EndLayerData";
Pattern pattern = Pattern.compile(regex); 
Matcher matcher = pattern.matcher(string); 
while (matcher.find()) { 
System.out.println(matcher.group(0)); 
}

Earlier Answer:
You are not getting the match properly as the regex you are using is not proper. Since it matches with everything that starts with "~LayerData" and ends with "~EndLayerData", the whole file is getting matched. Creating an appropriate regex using regex101.com (helps in visualizing) and using that should fix the issue.

huangapple
  • 本文由 发表于 2020年7月28日 13:57:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/63127906.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定