XPath只有在匹配时才附加文本。

huangapple go评论57阅读模式
英文:

XPath only append text if there is a match

问题

在Java 17中,我使用XPath从XML中提取数据,通过将所有位于<foo>下的<bar>连接起来。我使用的是Saxon 12,但是通过JAXP API来完成。我创建了一个XPathExpression,然后像这样调用它:

(String)xpathExpression.evaluate(context, XPathConstants.STRING)

我希望如果没有匹配的内容,这会给我一个null。但显然情况并非如此。让我们从这个XPath表达式开始(从我实际使用的内容简化而来)

/foo/string-join(bar, codepoints-to-string(10))

我希望这会连接所有/foo/bar字符串,用换行符分隔,如果有/foo的话,它会实现这一点。但如果没有/foo,那么它似乎不是返回null,而是返回一个空字符串。

我的第一个问题是如何检测这个XPath表达式没有匹配到/foo/。我曾以为XPathExpression.evaluate()如果没有匹配会返回null(现在读API,我想这只是我做出的假设)。

但假设我可以接受返回一个空字符串,并且可以检测返回的字符串是否为空,并将其视为不匹配(尽管从语义上来说这不是理想的情况)。问题是我希望该值以换行符结尾,因此我的表达式如下:

concat(/foo/string-join(bar, codepoints-to-string(10)), codepoints-to-string(10))

这更糟糕——现在如果没有/foo,它返回一个只有一个换行符\n的字符串,因为它将换行符附加到了没有匹配的内容,而将其视为空字符串。

我更希望找到一种方法,让JAXP在/foo不存在时返回null。但如果不能轻松实现这一点,我仍然希望在/foo不存在时至少获得一个空字符串,即concat()只在内部匹配成功时附加文本。我有一种感觉我将不得不构建一些复杂的解决方法,但也许XPath专家知道一些窍门或技巧。

英文:

In Java 17 I'm using XPath to extract data from XML by joining all the &lt;bar&gt;s under &lt;foo&gt;. I'm using Saxon 12, but I'm doing it through the JAXP API. I create an XPathExpression and then invoke it like this:

(String)xpathExpression.evaluate(context, XPathConstants.STRING)

I was hoping that this would give me a null if there was no match. But apparently this is not the case. Let's start with this XPath expression (simplified from what I'm using)

/foo/string-join(bar, codepoints-to-string(10))

I wanted this to join all the /foo/bar strings together, separated by newlines, which it does if there is a /foo. But if there is no /foo, then instead of returning null it seems to return an empty string.

My first question would be how to detect that this XPath expression did not match /foo/. I had assumed that XPathExpression.evaluate() would return null if there was no match. (Reading the API now I guess that was just an assumption I made.)

But let's say that I'm OK with returning an empty string, and I can detect if the returned string is empty and consider that a non-match (even though semantically that is not ideal). The problem is that I want the value to end with a newline as well, so my expression looks like this:

concat(/foo/string-join(bar, codepoints-to-string(10)), codepoints-to-string(10))

This is worse—now if there is no /foo, it returns a string with a single newline \n, because it appends the newline to the thing-that-did-not-match which it considered the empty string.

I would prefer to find a way for this expression to return null in JAXP if /foo does not exist. But if that can't easily be done, I'd prefer to still at least get an empty string if /foo does not exist, i.e. concat() only appends text if the inner match is successful. I have a feeling I'll have to construct some elaborate work around, but maybe an XPath expert knows a trick or two.

答案1

得分: 2

使用JAXP接口与XPath 2.0时,会遇到一个问题,即JAXP规范未说明当表达式返回XPath 1.0类型系统之外的值时会发生什么情况。因此,Saxon尽力解释意图。

如果没有foo元素,那么XPath表达式将返回一个空序列。JAXP规定原始结果应使用XPath转换规则转换为所需的返回类型。现在,在XPath 2.0中,应用于空序列的string()函数返回一个零长度字符串,而xs:string()构造函数返回一个空序列,这可能(也许)可以解释为等同于Java的null。但Saxon选择了string()转换并返回零长度字符串。

我的建议是切换到s9api接口,它可以完全访问XPath 2.0类型系统。我可能会编写一个返回字符串序列的表达式,并在Java中编写代码将其转换为单个字符串,而不是在XPath中进行转换。

但如果你想继续使用JAXP,你可以使用以下XPath表达式:

string-join(/foo/(bar || '&#39;\n&#39;))

(注意,\n是由Java编译器转换为换行符,而不是XPath引擎转换的)。

英文:

When you use the JAXP interface with XPath 2.0, you run into the problem that the JAXP specification doesn't say what happens when the expression returns values outside the XPath 1.0 type system. So Saxon does its best to interpret the intent.

If there is no foo element then the XPath expression returns an empty sequence. JAXP says that the raw result is converted to the required return type using XPath conversion rules. Now, in XPath 2.0 the string() function applied to an empty sequence returns a zero-length string, while the xs:string() constructor returns an empty sequence, which one might (perhaps) interpret as equivalent to a Java null. But Saxon chooses the string() conversion and returns a zero length string.

My advice would be to switch to the s9api interface which gives you full access to the XPath 2.0 type system. I would probably write an expression that returns a sequence of strings, and write the code to convert this into a single string in Java rather than in XPath.

But if you want to stick with JAXP, you could use the XPath expression

string-join(/foo/(bar || &#39;\n&#39;))

(Note, the \n is converted to a newline by the Java compiler, not by the XPath engine).

huangapple
  • 本文由 发表于 2023年6月22日 03:17:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76526474.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定