Project

Profile

Help

Bug #2130

closed

XPath referencing attribute with namespace fails when using DOM

Added by Dan Jones over 9 years ago. Updated over 9 years ago.

Status:
Won't fix
Priority:
Normal
Assignee:
Category:
JAXP Java API
Sprint/Milestone:
-
Start date:
2014-08-14
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

Using JAXP to create an XPath with the Saxon-HE implementation fails if you run it on a org.w3c.dom.Document (using Xerces 2.7.1 - but that shouldn't matter) where the XPath references an attribute with a namespace prefix, unless that namespace is defined in the same Element as the prefixed attribute.

For example, using the below code on Saxon-HE 9.5.1.6 (I can't find 9.5.1.7 in Maven but I have confirmed that the issue should still exist by analysing the code):

public static void main(String[] args) throws Exception {
	System.setProperty("javax.xml.xpath.XPathFactory", "net.sf.saxon.xpath.XPathFactoryImpl");
	System.setProperty("javax.xml.parsers.DocumentBuilderFactory", "org.apache.xerces.jaxp.DocumentBuilderFactoryImpl");
	XPathFactory xpf = XPathFactory.newInstance();
	XPath xp = xpf.newXPath();

	// Note the reference to attribute @xsi:type!!!
	XPathExpression xpe = xp.compile("/root/child/item[@xsi:type=\"typeA\"]/info");
	
	DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
	DocumentBuilder db = dbf.newDocumentBuilder();
	Document doc = db.parse(ClassLoader.getSystemResourceAsStream("test.xml"));
	
	System.out.println("XPath: " + xpe.evaluate(doc));
}

If you run this on the document (named test.xml)

<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
	<child>
		<item name="item1" xsi:type="typeA" >
			<info>1234</info>
			<more>abdc</more>
		</item>
	</child>
</root>

The output is "Xpath: "

If you run the above code on this altered XML document:

<?xml version="1.0" encoding="UTF-8"?>
<root>
	<child>
		<item name="item1" xsi:type="typeA" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >
			<info>1234</info>
			<more>abdc</more>
		</item>
	</child>
</root>

The output is "XPath: 1234"

The culprit is net.sf.saxon.dom.DOMNodeWrapper line 501 (503 in 9.5.1.7)

Node node = attr.getOwnerElement();
do {
	String attVal = ((Element) node).getAttribute(attName); // this is the line!!!
	if (attVal != null) {
		return attVal;
	}
	node = node.getParentNode();
} while (node != null && node.getNodeType() == Node.ELEMENT_NODE);

By the specification of JAXP - Element#getAttribute(String attrName) returns an "empty String if that attribute does not have a specified or default value.".

This means that attVal is ALWAYS an empty String and so returns an empty String without traversing the rest of the tree and finding the URI.

What this should do is first call Element#hasAttribute(String attrName) and return the attribute if this returns true (similar to DOMNodeWrapper#getElementURI(Element element)) i.e.:

if (((Element)node).hasAttribute(attName)) {
	return ((Element) node).getAttribute(attName);
}

This is a simple fix, however I have a NamespaceContext declared that defines some namespaces that aren't declared in some XML documents I need to process. I notice DOMNodeWrapper traverses the tree to find the namespace declaration, and doesn't query the NamespaceContext. I have literally learnt the internals of Saxon today to solve this problem and I am by no means an expert on the API. Is there a reason why the NamespaceContext isn't queried, or is it just because the DOMNodeWrapper doesn't have access to the JAXPXPathStaticContext stored in XPathEvaluator? If it's the latter, what would be the best way to open this up?

I don't mind implementing the fix, however I want some feedback by someone regarding that last paragraph as it's an important issue for me that I need to fix

Thanks, Dan


Files

saxon-bug-danj.tar.gz (1.15 KB) saxon-bug-danj.tar.gz Simple Maven project demonstrating the bug Dan Jones, 2014-08-15 00:35
Actions #1

Updated by Dan Jones over 9 years ago

Just thought I'd add that the snippet of code I've said is incorrect (DOMNodeWrapper line 501) is within the getAttributeURI(Attr attar) method

Actions #2

Updated by Michael Kay over 9 years ago

Does it work if you set the documentBuilder to be namespaceAware?

I don't think we support access to a DOM that isn't namespace aware.

Having said that, the code change you point out may well need to be made anyway.

Actions #3

Updated by Michael Kay over 9 years ago

Confirmed that this problem occurs when using a non-namespace-aware DOM as input to Saxon.

I'm not sure whether there are other potential problems in handling a non-namespace-aware DOM, but this code is attempting to handle this case and is clearly incorrect.

Patch raised for the 9.5 and 9.6 branches; regression tested only.

Actions #4

Updated by Dan Jones over 9 years ago

Oops, sorry. I wrote a reply earlier and left it open without submitting :)

Just to confirm that this isn't an issue if the Document is namespace-aware, it's just when it is non-namespace-aware.

Actions #5

Updated by Michael Kay over 9 years ago

Regarding this question:

I notice DOMNodeWrapper traverses the tree to find the namespace declaration, and doesn't query the NamespaceContext. I have literally learnt the internals of Saxon today to solve this problem and I am by no means an expert on the API. Is there a reason why the NamespaceContext isn't queried, or is it just because the DOMNodeWrapper doesn't have access to the JAXPXPathStaticContext stored in XPathEvaluator? If it's the latter, what would be the best way to open this up?

I believe Saxon is behaving correctly: namespace prefixes appearing within the DOM must be declared within the DOM; namespace prefixes used within the XPath expression must be declared in the NamespaceContext.

Actions #6

Updated by Dan Jones over 9 years ago

I was just writing another comment about this.

Would it be possible to add a similar feature to how NamespaceContexts work with XPaths, whereby a NamespaceContext (same interface as the one in javax.xml.namespace or different if required) could be added to the Configuration of the XPathExpression so that if a DOM exists that isn't namespace aware (and doesn't contain the declarations), it can still have XPaths run on the document that reference them and it work successfully?

An example of what I mean is as follows (may not be the actual implementation solution):

  1. Take the XML document in the example above and remove the namespace declaration for 'xsi'.

  2. Load the document to DOM (or to any other structure that's non-namespace-aware)

  3. Create XPathExpression with the NamespaceContext (which defines 'xsi') as normal

  4. Compile the XPath to return XPathExpression - give the XPathExpression visibility of the NamespaceContext somehow

  5. XPathExpression is being run on the DOM, it tries to resolve the 'xsi' prefix. If it can't find it by searching the DOM (as it currently [should] works), it then queries the NamespaceContext and gets the URI it's looking for

I understand that by default, Saxon doesn't support DOM that isn't namespace-aware, but I thought it would be worth suggesting and getting your comments :)

The alternative is I parse the XML document, add the namespaces by manipulating the structure, and recreate a namespace-aware DOM from the manipulated one. But I'm wary of the performance implications of doing this

Actions #7

Updated by Dan Jones over 9 years ago

I also don't mind having a go at implementing it if someone agrees it's a good idea.

However I understand you may not want this feature, in which case I'll see if my alternative isn't too performance-intensive

Actions #8

Updated by Michael Kay over 9 years ago

The XDM specification allows you to construct an XDM instance any way you like, so it's hard to argue that what you are proposing would be non-conformant; on the other hand, it's certainly an invention of your own and not an implementation of any known spec. It would therefore need very careful specification work and a large number of test cases, all of which would be a lot more work than implementing the code. I don't think it's something we would want to support in Saxon: we have quite enough problems with DOM interfaces as it is.

Since you mention performance, I do hope you are aware that running Saxon against a DOM is up to 10 times slower than using Saxon's native tree implementation, so doing it with a DOM that is created directly by an XML parser (as in this example) is nearly always a bad idea. It's also not thread-safe. The only valid reason for doing it is if your application needs a DOM tree for some other reason, e.g. if it is the output of some previous processing phase.

Actions #9

Updated by Dan Jones over 9 years ago

I'll have a chat with people from work and see what I should do now. I'm validating the contents of some XML which is communicated between 2 sub-systems, and I've been told the documents may not contain all namespace declarations (so this is one of my test cases in upgrading to saxon 9.5 which led me to find this bug). I might double check this information though, because we never had any issues using Saxon-B 9.0, and testing 9.0 at home also seems to have the same issue.

Hopefully I'll be told this information is incorrect so I can use Saxons internal API :-)

Actions #10

Updated by Michael Kay over 9 years ago

  • Status changed from New to Won't fix
  • Assignee set to Michael Kay

Closing this with no action as the conversation has gone quiet. Feel free to reopen the discussion if you want to take it further.

Actions #11

Updated by O'Neil Delpratt over 9 years ago

  • Found in version changed from 9.5.1.7 to 9.5
  • Fixed in version set to 9.5.1.8

Partial fix applied to the Saxon maintenance release 9.5.1.8.

Please register to edit this issue

Also available in: Atom PDF