2020年9月11日 01:47:07go评论127阅读模式

英文:

Regex to find XML tag in multiline string

问题

以下是翻译好的代码部分：

public static String getTagAValue(String xmlAsString) {
    Pattern pattern = Pattern.compile("<TagA>(.+)</TagA>");
    Matcher matcher = pattern.matcher(xmlAsString);
    if (matcher.find()) {
        return matcher.group(1);
    } else {
        return null;
    }
}

XML示例：

<xml>
    <sample>
        <TagA>result</TagA>
    </sample>
</xml>

注意，这里我使用了4个空格来表示制表符，但实际字符串中可能包含制表符。

英文:

Here is a simple function I wrote to get the value from a tag.

public static String getTagAValue(String xmlAsString) {
	Pattern pattern = Pattern.compile(&quot;&lt;TagA&gt;(.+)&lt;/TagA&gt;&quot;);
	Matcher matcher = pattern.matcher(xmlAsString);
	if (matcher.find()) {
		return matcher.group(1);
	} else {
		return null;
	}
}

It is not finding a match and returning null.

XML Sample

&lt;xml&gt;
    &lt;sample&gt;
        &lt;TagA&gt;result&lt;/TagA&gt;
    &lt;/sample&gt;
&lt;/xml&gt;

Note, here I used 4 spaces for tabs, but the real string would contain tabs.

答案1

得分: 3

不要使用正则表达式解析XML：这不是适合该任务的工具。

经典答案在这里：https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

你所接受的答案给出了错误的结果，例如：

它不会在允许空格的位置接受空格，比如在“>”之前；
它会匹配被注释掉的元素或出现在CDATA节中的元素；
它使用贪婪匹配，因此它会找到最后一个匹配的结束标签，而不是第一个匹配的标签。

无论你多么努力，都不可能百分之百地做对。

如果你更在意性能而不是正确性，那么它也因为需要回溯而极其低效。

为了正确而专业地完成这项工作，使用XML解析器。

英文:

Don't use regular expressions to parse XML: it's the wrong tool for the job.

Classic answer here: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

The answer you have accepted gives wrong answers, for example:

It doesn't accept whitespace in places where whitespace is allowed, such as before ">"
It will match a commented-out element, or one that appears in a CDATA section
It does a greedy match, so it will find the LAST matching end tag, not the first one.

However hard you try, you will never get it 100% right.

And in case you care more about performance than correctness, it's also grossly inefficient because of the need for backtracking.

To do the job properly and professionally, use an XML parser.

答案2

得分: 2

你可能希望启用正则表达式在多行上工作：

Pattern.compile("<TagA>(.+)</TagA>", Pattern.DOTALL);

文档解释了参数Pattern.DOTALL：

启用dotall模式。在dotall模式下，表达式.匹配任何字符，包括行终止符。默认情况下，此表达式不匹配行终止符。

编辑： 虽然在这种特定情况下这样做是有效的，但如果您想专业、高效且正确地解决此类问题，请参考Michael Kay的回答。

英文:

You probably want to enable that the RegExp works on multi-line:

Pattern.compile(&quot;&lt;TagA&gt;(.+)&lt;/TagA&gt;&quot;, Pattern.DOTALL);

Documentation explains the parameter Pattern.DOTALL:

> Enables dotall mode. In dotall mode, the expression . matches any
> character, including a line terminator. By default this expression
> does not match line terminators.

Edit: While this works in this particular case, please everyone refer to the answert of Michael Kay if you want to solve such problems professionally, efficiently and right.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

正则表达式以在多行字符串中查找XML标签

问题

答案1

答案2

BigO 表示在搜索一个不断变化的字符串中寻找字符的复杂度。

创建一个用于从已实例化的对象中创建新实例的供应商的Java代码。

如何找到输入字符串的前4个字符长度？

Windows 10的文件管理器在zip预览中显示错误的时区。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论