正则表达式 – Java中的多行模式

huangapple go评论154阅读模式
英文:

Regexpression - mutliline in Java

问题

String multiline = `
This is my "test" case
with lines
\section{new section}
Another incorrect test"
\section{next section}
With some more "text"
\subsection{next section}
With some more "text1"
`;

String pattern1 = "(^(\\\\.?section\\{[^}]+\\})[\\s\\S]*?(\\\\"[^&]+\\\\"))";
Pattern p = Pattern.compile(pattern1, Pattern.MULTILINE);
Matcher m = p.matcher(multiline);

while (m.find()) {
    String sectionHeader = m.group(2);
    String replacedText = m.group(3).replaceAll("\\\\"([\\w]+)\\\\"", "\\\\glqq$1\\\\qrqq");
    System.out.println(sectionHeader);
    System.out.println(replacedText);
}

Note: The code provided above is a translation of the given Java code, addressing your concerns about grouping and replacing the quotes with the desired format. Make sure to adapt this code into your project as needed.

英文:

I have an arbitray string, e.g.

String multiline=`
This is my "test" case
with lines
\section{new section}
Another incorrect test"
\section{next section}
With some more "text"
\subsection{next section}
With some more "text1"
`

I use LaTeX and I want to replace the quotes with those which are used in books - similar to ,, and ´´ For this I need to replace the beginning quotes with a \glqq and the ending with a \qrqq - for each group which starts with \.?section.

If I try the following

String pattern1 = "(^\\\\.?section\\{.+\\})[\\s\\S]*(\\\"(.+)\\\")";
Pattern p = Pattern.compile(pattern1, Pattern.MULTILINE);
Matcher m = p.matcher(testString);
System.out.println(p.matcher(testString).find()); //true

while (m.find()) {
  for (int i = 0; i < 4; i++) {
    System.out.println("Index: " + i);
    System.out.println(m.group(i).replaceAll("\"([\\w]+)\"", "\u00AB$1\u00BB"));
  }
}

I get as a result on the console

true
Index: 0
\section{new section}
Another incorrect test"
\section{next section}
With some more «text1»
Index: 1
\section{new section}
Index: 2
«text1»
Index: 3
text1

My some problems with the current approach:

  1. The first valid match ("text") isn't found. I guess it has to do with the mulitline and incorrect grouping of \section{. The grouping for the quotes should be restricted to a group which starts with \section and ends with \?.section - how to make this correct?
  2. Even when the text is found properly - how to get a complete string with the replacements?

答案1

得分: 1

你可以匹配在section和下一个section或字符串结尾之间的所有文本,并将其中所有的"..."字符串替换为«...

以下是Java代码片段(见demo):

String s = "«This is my \"test\" case\nwith lines\n\\section{new section}\nAnother incorrect test\"\n\\section{next section}\nWith some more \"text\"\n\\subsection{next section}\nWith some more \"text1\"»";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile("(?s)section.*?(?=section|$)").matcher(s);
while (m.find()) {
    String out = m.group(0).replaceAll(""([^"]*)"", "«$1»");
    m.appendReplacement(result, Matcher.quoteReplacement(out));
}
m.appendTail(result);
System.out.println(result.toString());

输出:

«This is my "test" case
with lines
\section{new section}
Another incorrect test"
\section{next section}
With some more «text»
\subsection{next section}
With some more «text1»»

该模式解释如下:

  • (?s) - Pattern.DOTALL 嵌入式标志选项
  • section - 一个 section 子字符串
  • .*? - 任意0个以上字符,尽量少匹配
  • (?=section|$) - 正向先行断言,要求 section 子字符串或字符串结尾紧随当前位置右侧。
英文:

You may match all texts between section and the next section or end of string, and replace all "..." strings inside it with «....

Here is the Java snippet (see demo):

String s = "This is my \"test\" case\nwith lines\n\\section{new section}\nAnother incorrect test\"\n\\section{next section}\nWith some more \"text\"\n\\subsection{next section}\nWith some more \"text1\"";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile("(?s)section.*?(?=section|$)").matcher(s);
while (m.find()) {
	String out = m.group(0).replaceAll("\"([^\"]*)\"", "«$1»");
	m.appendReplacement(result, Matcher.quoteReplacement(out));
}
m.appendTail(result);
System.out.println(result.toString());

Output:

This is my "test" case
with lines
\section{new section}
Another incorrect test"
\section{next section}
With some more «text»
\subsection{next section}
With some more «text1»

The pattern means:

  • (?s) - Pattern.DOTALL embedded flag option
  • section - a section substring
  • .*? - any 0+ chars, as few as possible
  • (?=section|$) - a positive lookahead that requires a section substring or end of string to appear immediately to the right of the current location.

huangapple
  • 本文由 发表于 2020年3月16日 17:40:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/60703527.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定