英文:
Regexpression - mutliline in Java
问题
String multiline = `
This is my "test" case
with lines
\section{new section}
Another incorrect test"
\section{next section}
With some more "text"
\subsection{next section}
With some more "text1"
`;
String pattern1 = "(^(\\\\.?section\\{[^}]+\\})[\\s\\S]*?(\\\\"[^&]+\\\\"))";
Pattern p = Pattern.compile(pattern1, Pattern.MULTILINE);
Matcher m = p.matcher(multiline);
while (m.find()) {
String sectionHeader = m.group(2);
String replacedText = m.group(3).replaceAll("\\\\"([\\w]+)\\\\"", "\\\\glqq$1\\\\qrqq");
System.out.println(sectionHeader);
System.out.println(replacedText);
}
Note: The code provided above is a translation of the given Java code, addressing your concerns about grouping and replacing the quotes with the desired format. Make sure to adapt this code into your project as needed.
英文:
I have an arbitray string, e.g.
String multiline=`
This is my "test" case
with lines
\section{new section}
Another incorrect test"
\section{next section}
With some more "text"
\subsection{next section}
With some more "text1"
`
I use LaTeX and I want to replace the quotes with those which are used in books - similar to ,, and ´´ For this I need to replace the beginning quotes with a \glqq
and the ending with a \qrqq
- for each group which starts with \.?section
.
If I try the following
String pattern1 = "(^\\\\.?section\\{.+\\})[\\s\\S]*(\\\"(.+)\\\")";
Pattern p = Pattern.compile(pattern1, Pattern.MULTILINE);
Matcher m = p.matcher(testString);
System.out.println(p.matcher(testString).find()); //true
while (m.find()) {
for (int i = 0; i < 4; i++) {
System.out.println("Index: " + i);
System.out.println(m.group(i).replaceAll("\"([\\w]+)\"", "\u00AB$1\u00BB"));
}
}
I get as a result on the console
true
Index: 0
\section{new section}
Another incorrect test"
\section{next section}
With some more «text1»
Index: 1
\section{new section}
Index: 2
«text1»
Index: 3
text1
My some problems with the current approach:
- The first valid match (
"text"
) isn't found. I guess it has to do with the mulitline and incorrect grouping of\section{
. The grouping for the quotes should be restricted to a group which starts with\section
and ends with\?.section
- how to make this correct? - Even when the text is found properly - how to get a complete string with the replacements?
答案1
得分: 1
你可以匹配在section
和下一个section
或字符串结尾之间的所有文本,并将其中所有的"..."
字符串替换为«...
。
以下是Java代码片段(见demo):
String s = "«This is my \"test\" case\nwith lines\n\\section{new section}\nAnother incorrect test\"\n\\section{next section}\nWith some more \"text\"\n\\subsection{next section}\nWith some more \"text1\"»";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile("(?s)section.*?(?=section|$)").matcher(s);
while (m.find()) {
String out = m.group(0).replaceAll(""([^"]*)"", "«$1»");
m.appendReplacement(result, Matcher.quoteReplacement(out));
}
m.appendTail(result);
System.out.println(result.toString());
输出:
«This is my "test" case
with lines
\section{new section}
Another incorrect test"
\section{next section}
With some more «text»
\subsection{next section}
With some more «text1»»
该模式解释如下:
(?s)
-Pattern.DOTALL
嵌入式标志选项section
- 一个section
子字符串.*?
- 任意0个以上字符,尽量少匹配(?=section|$)
- 正向先行断言,要求section
子字符串或字符串结尾紧随当前位置右侧。
英文:
You may match all texts between section
and the next section
or end of string, and replace all "..."
strings inside it with «...
.
Here is the Java snippet (see demo):
String s = "This is my \"test\" case\nwith lines\n\\section{new section}\nAnother incorrect test\"\n\\section{next section}\nWith some more \"text\"\n\\subsection{next section}\nWith some more \"text1\"";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile("(?s)section.*?(?=section|$)").matcher(s);
while (m.find()) {
String out = m.group(0).replaceAll("\"([^\"]*)\"", "«$1»");
m.appendReplacement(result, Matcher.quoteReplacement(out));
}
m.appendTail(result);
System.out.println(result.toString());
Output:
This is my "test" case
with lines
\section{new section}
Another incorrect test"
\section{next section}
With some more «text»
\subsection{next section}
With some more «text1»
The pattern means:
(?s)
-Pattern.DOTALL
embedded flag optionsection
- asection
substring.*?
- any 0+ chars, as few as possible(?=section|$)
- a positive lookahead that requires asection
substring or end of string to appear immediately to the right of the current location.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论