2020年10月1日 14:47:11go评论108阅读模式

英文:

Regex - Unformat XML

问题

我正在尝试将XML格式化为单行。（使用JAVA）

我尝试使用以下正则表达式进行替换。

input.replaceAll("&gt;\\s+", "&gt;").replaceAll("\\s+&lt;", "&lt;");

然而，这也会移除元素前后的空格。这是意外的。

例如：

情况01

之前：<AAA>{空格}{空格}{空格}</AAA>

之后：<AAA></AAA>

情况02

之前：<AAA>{空格}{空格}123{空格}{空格}</AAA>

之后：<AAA>123</AAA>

情况03

之前：<AAA>{空格}A{空格}B{空格}C{空格}</AAA>

之后：<AAA>A{空格}B{空格}C</AAA>

有没有办法取消格式化并避免上述情况？

英文:

I am trying to unformat a XML to single line. (Using JAVA)

I trying to use following regex to replace.

input.replaceAll(&quot;&gt;\\s+&quot;, &quot;&gt;&quot;).replaceAll(&quot;\\s+&lt;&quot;, &quot;&lt;&quot;);

However, it also will remove the space in front and behind element.
Which is unexpected.

For example:

Scenario 01

Before: <AAA>{space}{space}{space}</AAA>

After: <AAA></AAA>

Scenario 02

Before: <AAA>{space}{space}123{space}{space}</AAA>

After: <AAA>123</AAA>

Scenario 03

Before: <AAA>{space}A{space}B{space}C{space}</AAA>

After: <AAA>A{space}B{space}C</AAA>

Is there any way to unformat and avoid scenario above?

答案1

得分: 1

一个萨克森解决方案：

Processor p = new Processor(false);
DocumentBuilder db = p.newDocumentBuilder();
db.setWhitespaceStrippingPolicy(WhitespaceStrippingPolicy.ALL);
XdmNode doc = db.build(new File(...));
Serializer s = p.newSerializer(new File(...));
s.serialize(doc.asSource());

通过在Serializer对象上设置属性，您可以对输出格式有相当多的控制。

英文:

A Saxon solution:

Processor p = new Processor(false);
DocumentBuilder db = p.newDocumentBuilder();
db.setWhitespaceStrippingPolicy(WhitespaceStrippingPolicy.ALL);
XdmNode doc = db.build(new File(...));
Serializer s = p.newSerializer(new File(...));
s.serialize(doc.asSource());

This gives you quite a lot of control over the format of the output by setting properties on the Serializer object.

答案2

得分: 0

这将仅替换标签结束后和标签开始前的垂直空白，例如"\n"、"\r"或其组合以及其他情况。

input.replaceAll(&quot;&gt;\\v+&quot;, &quot;&gt;&quot;).replaceAll(&quot;\\v+&lt;&quot;, &quot;&lt;&quot;);

来自 https://www.regular-expressions.info/shorthand.html 的摘录说：

> \v 匹配“垂直空白”，包括 Unicode 标准中视为换行的所有字符。与 [\n\cK\f\r\x85\x{2028}\x{2029}] 相同。

英文:

This will only replace vertical whitespaces following tag ends and preceding tag starts, e.g. "\n", "\r" or combinations, and others.

input.replaceAll(&quot;&gt;\\v+&quot;, &quot;&gt;&quot;).replaceAll(&quot;\\v+&lt;&quot;, &quot;&lt;&quot;);

Excerpt from https://www.regular-expressions.info/shorthand.html says:

> \v matches “vertical whitespace”, which includes all characters treated as line breaks in the Unicode standard. It is the same as [\n\cK\f\r\x85\x{2028}\x{2029}].

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

正则表达式 – 取消格式化 XML

问题

答案1

答案2

如果我将 HashSet 转换为 TreeSet，时间复杂度是多少。

Apache Tomcat升级到8.5.51会引发400错误。

为什么我们不能在类中声明变量（字段）后定义（赋值）它？

Bearer令牌未包含在请求中

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。