Project

Profile

Help

Support #5687

closed

Expected exception with variable resolver returning null

Added by Isaac Rivera Rivas over 1 year ago. Updated over 1 year ago.

Status:
Won't fix
Priority:
Normal
Category:
-
Sprint/Milestone:
-
Start date:
2022-09-14
Due date:
% Done:

0%

Estimated time:
1:00 h
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

Hey there!

I'm new to the community so hope this is the right way to create an issue. I noticed that there seems to be a behavior difference in the XPathFactory between a JDK (specifically openjdk version "1.8.0_345") and Saxon (specifically 11.4) creating a factory with net.sf.saxon.xpath.XPathFactoryImpl instead of the default XPathFactory.

When I do a variable lookup for a variable that doesn't exist, I would expect to get an exception thrown but in Saxon I get an empty result back. If I'm not mistaken, Saxon follows the JAXP 1.3 spec for this implementation and according to the spec when evaluating expressions for variables, "... An javax.xml.xpath.XPathExpressionException is raised if the variable resolver is undefined or the resolver returns null for the variable." The fix is pretty easy for this based of the 11.4 latest code.

Wanted to start a discussion for this to see what the community thought about this and if possible, I can provide a fix for it as well. I'm attaching a sample java class which hopefully provides more insights.

Thanks in advance!


Files

StandaloneSaxonTest.java (3.13 KB) StandaloneSaxonTest.java Sample code for reproducing the issue Isaac Rivera Rivas, 2022-09-14 22:46
Actions #1

Updated by Michael Kay over 1 year ago

The JAXP spec (a) doesn't say anything about how an XPath 2.0 (or later) processor should behave, and (b) doesn't define any mapping from Java values to XPath values.

I think it's useful for the mapping rules to be the same as the rules for invoking "reflexive" Java extension functions. In those rules, the Java value null generally maps to the XPath value () (that is, an empty sequence).

The general rules are defined at https://www.saxonica.com/documentation11/index.html#!extensibility/java-to-xdm-conversion -- though they do say that the treatment of null is not entirely consistent between different interfaces.

I think that mapping null to () is the logical way to extend the JAXP spec to handle XPath 2.0, and I don't see a strong reason to change the way we've chosen to do it.

Actions #2

Updated by Michael Kay over 1 year ago

For interest (though it's not directly relevant), see also bug #2554 for some history of our difficulties with the JAXP specification in this area.

Actions #3

Updated by Isaac Rivera Rivas over 1 year ago

Thanks for the quick reply and the details! I’m still a novice trying to understand XPath and JAXP mainly coming from my (basic) knowledge of Xalan.

I’m looking at Saxon HE as an alternative for Xalan and am still a bit stumped on this behavior difference. Comparing, I can see the exception coming from Xalan here https://github.com/apache/xalan-java/blob/master/src/org/apache/xpath/jaxp/JAXPVariableStack.java#L63-L69 which I believe is due to the XPath spec that from the javadoc https://docs.oracle.com/en/java/javase/11/docs/api/java.xml/javax/xml/xpath/package-summary.html says it supports XPath 1.0.

I looked at both the JAXP 1.3 spec and the XPath 1.0 and 2.0 spec and can see similar statements to the degree that trying to access undeclared variables must throw an error. Specifically for XPath 2.0, I can see this It is a static error [err:XPST0008] to reference a variable that is not in scope so I would assume an error should still be thrown in Saxon as well. I also tried a couple of XPath evaluators using XPath 2.0 and 3.0 like this one for example http://videlibri.sourceforge.net/cgi-bin/xidelcgi with an input of $doc//a and they still throw an error when parsing the undeclared variable $doc similar to Xalan’s behavior.

Would you mind clarifying a bit more as to why the decision was made to not throw the exception on the undeclared variable?

Actions #4

Updated by Michael Kay over 1 year ago

Because the JAXP specifications have not been updated to handle the much richer type system of XPath 2.0, we had to make our own decisions. And even for 1.0, the JAXP specifications don't define a mapping from Java types to XPath, so we had to make our own decisions there as well.

For XPath 1.0, the only types are string, double, boolean, node, and node-set, so the only real question arises for node and node-set - JAXP sometimes tries to be independent of the tree model (DOM, JDOM etc) and sometimes it seems to assume DOM.

We've generally tried to follow the JAXP specifications closely wherever possible, though we haven't always changed our implementation when the spec was "clarified" - partly because we never did detailed comparisons of successive versions of the spec to discover the clarifications. Many of the clarifications, sadly, are simply documenting Xalan behaviour -- there's no process whereby third-party implementors are consulted about changes.

But for 2.0 we had no choice but to do the mapping from Java types to XPath 2.0 types ourselves, in our own way, and mapping null to empty sequence seemed to us the right choice.

JAXP is intended, of course, to allow applications to be written that work across multiple implementations. But it leaves so much undefined -- including the version of XPath you're using -- that it doesn't achieve that aim particularly well.

Many years ago the Saxon JAR file included a manifest that meant if Saxon was on the classpath, applications would pick up Saxon by default. That caused a lot of problems with applications that assumed the Xalan behaviour, even where it wasn't written into the JAXP spec.

Yes, it's an error in XPath 2.0 to reference a variable that isn't present in the static context. Unfortunately JAXP provides no way of asking what variables are present in the static context; we can only ask about variables at execution time. At expression compile time, we therefore have to assume that all variable names are available. Intepreting a "null" from the variable resolver as meaning the variable is "undeclared" doesn't work, because we can only call the variable resolver at execution time.

Basically, there are serious mismatches between XPath 2.0 and JAXP, and we've had to find our own solutions to these conflicts.

Actions #5

Updated by Isaac Rivera Rivas over 1 year ago

Thanks for the reply! I sincerely appreciate the level of detail in your response. I have a better idea of the decision behind Saxon's implementation and the behavior difference now. I'll check if I can adapt the code to the behavior of Saxon for this. Thanks again for clearing this up!

Actions #6

Updated by Michael Kay over 1 year ago

  • Tracker changed from Bug to Support
  • Status changed from New to Won't fix

Closing this, we decided to stick with the current design. But thanks for raising it.

Please register to edit this issue

Also available in: Atom PDF