Project

Profile

Help

Does Saxon-CE have its own XML parser ?

Added by David Lee over 7 years ago

Does Saxon-CE have its own XML parser and serializer or does it rely on the browser's parser ? Why I ask is I want to work with some complex documents that have multiple namespaces, both query and produce results in this schema. Many browsers do not handle namespaces or do so poorly. Does Saxon-CE solve this problem or is it limited by the browser support for namespaces ? Thanks -David


Replies (6)

Please register to reply

RE: Does Saxon-CE have its own XML parser ? - Added by Michael Kay over 7 years ago

Saxon-CE relies on the XML parser made available through GWT, which I believe is generally a wrapper around the browser's native XML parser. I'm aware that browsers have restrictions on the use of entities and DTDs, but as far as I'm aware the XML parser (as distinct from HTML) generally handles namespaces correctly. If you know of any problems in this area, please share them explicitly.

Saxon-CE does not include a serializer; it never produces serialized XML or HTML.

RE: Does Saxon-CE have its own XML parser ? - Added by David Lee over 7 years ago

My intended app is also GWT based. I have not yet tried this myself so am in the research phase but so far what I have found is users having problems with various browsers and namespaces

Example:

https://code.google.com/p/google-web-toolkit/issues/detail?id=4070

http://stackoverflow.com/questions/2857131/gwt-xml-documents-with-namespaces

http://www.sencha.com/forum/showthread.php?249236-Use-XmlReader-AutoBeans-to-parse-XML-that-has-attributes-in-the-element&langid=4

"However, browsers have inconsistent support (I'm looking at you, IE) for namespaces, and so the XML module in GWT doesn't know how to talk about namespaces,"

I will try some more experiments on my own, but so far reading suggests that browsers do not universally support XML with namespaces.

-David

RE: Does Saxon-CE have its own XML parser ? - Added by Michael Kay over 7 years ago

Saxon does of course have problems because DOM handles namespaces in such a clumsy way, but this is the same in Saxon-CE as on the server. Generally we hope that Saxon gets it right according to the XDM rules, even if the representation of namespaces at the DOM level can be very odd.

RE: Does Saxon-CE have its own XML parser ? - Added by David Lee over 7 years ago

I have narrowed down the issue to a conceptual issue on my part. I doubt Saxon-CE is affected but would have to try it on mulitiple browsers and version to be sure. What led me down the rathole was that the Google GWT XML class library has no support for namespaces explicitly. It exposes only non-namespace methods. This, and various internet postings led me to believe the problem was universal on browsers. I experimented with GWT XML API by giving it a UBL 2.1 document (has 3 namespaces ) and it parsed fine and I was able to locate elements by tag name by specifying only the localname. Converting the DOM back to a string did produce proper XML with NS (I tried this in the latest versions of IE, FF and Chrome). My conclusion from this tiny experience is that the parsers on these browsers properly handled namespaces, and the DOM kept them properly. But that the API's for manipulating the DOM is lacking and divergent. For example IE only supports creating an element with a NS by using the createNode() method (not createElementNS()). From this I am guessing that GWT simply choose not to deal with the problem and exposed only the universally supported non-NS methods. ( This is mostly conjecture ... )

So my guess would be that Saxon-CE if it doesn't rely on the helper methods and instead uses the lower level methods on the DOM (like getChildNodes() etc) then it should work. For construction of results with namespaces I am guessing createNode() is used which allows for NS creation (atleast on IE).

XML1.1 + Namespaces was better and consistantly supported in JavaScript !!!

-David

RE: Does Saxon-CE have its own XML parser ? - Added by Michael Kay over 7 years ago

Saxon is well accustomed to dealing with the poor support for namespaces in DOM APIs. I forget the details, but there is one API where what you get back depends on whether the node was created using level-1 or level-2 methods, and there is no way we can discover how the node was created, so we have to try both.... It's tortuous and inefficient, and this is why why advise people so strongly against using DOM on the server. Unfortunately on the client there is no choice. Bringing in a SAX parser as part of Saxon-CE might well be a good move.

RE: Does Saxon-CE have its own XML parser ? - Added by David Lee over 7 years ago

Yes having your own parser would be awesome. Especially if it was a "pluggable" parser so users could substute say micro-xml or even gasp JSON. While your at it (supply free products for the needy) could you add XPath 3.0 and XQuery 3.0 to Saxon CE ? Please ? I will buy you dinner and beverages Balisage 2013.

Thanks -David

    (1-6/6)

    Please register to reply