Project

Profile

Help

Is type of nodes returned by parse-xml/parse-xml-fragment/parse-html dependent on the tree model worked with?

Added by Martin Honnen about 1 year ago

Saxon has its own different tree models and can wrap various other tree models like DOM, JDOM. If I start with a certain tree model I guess the tree I build with DocumentBuilder has XdmNodes wrapping the underlying original node; but I wonder what happens if part of my XPath uses parse-xml or parse-xml-fragment or parse-html, for the result of those function calls, will I always get XdmNodes in form of TinyTree nodes?


Replies (3)

Please register to reply

RE: Is type of nodes returned by parse-xml/parse-xml-fragment/parse-html dependent on the tree model worked with? - Added by Michael Kay about 1 year ago

From looking at the SaxonJ code it appears, rather oddly, that parse-xml() always builds a TinyTree, whereas parse-xml-fragment and parse-html use whatever has been configured using Feature.TREE_MODEL.

On SaxonCS parse-html puts a wrapper around the HTML tree constructed by AngleSharp.

RE: Is type of nodes returned by parse-xml/parse-xml-fragment/parse-html dependent on the tree model worked with? - Added by Martin Honnen about 1 year ago

I have tried some sample code and, testing with Saxon 12, if I use XPath over DOM by using net.sf.saxon.xpath.XPathFactoryImpl Saxon interestingly enough for expressions constructing nodes with parse-xml-fragment or saxon:parse-html indeed returns DOM nodes (i.e. net.sf.saxon.dom.DocumentOverNodeInfo.

Then I was wondering if I use s9api and e.g. saxonDocBuilder.setTreeModel(DOMObjectModel.getInstance()); and use such expressions with parse-xml-fragment or saxon:parse-html whether I could expect such a net.sf.saxon.dom.DocumentOverNodeInfo to be available via getExternalNode() but that seems to return null.

In the end I am looking at writing extension methods on the SaxonCS API and e.g. System.Xml.XmlNode to use Saxon's XPath implementation and wonder whether there is a way I can expect or configure the API to return an XdmNode for stuff like parse-xml-fragment as the result of GetUnderlyingXmlNode(), so far that always seems to return null.

Any thoughts on whether and how that would be possible?

RE: Is type of nodes returned by parse-xml/parse-xml-fragment/parse-html dependent on the tree model worked with? - Added by Michael Kay about 1 year ago

DocumentNodeOverNodeInfo is an "inverse wrapper" where we wrap Saxon's XDM nodes into a (read-only) DOM node. I think we use this only where we have to support an interface where DOM nodes are mandated, which occurs in XQJ and in the JAXP XPath API. There's no equivalent in SaxonCS.

In SaxonCS I think the only time we create DOM nodes (System.Xml.XmlNode) is when you send output to a DomDestination; and there it will be a true DOM node, not a wrapper. Indeed, it has to be, because System.Xml.XmlNode is a Class not an Interface.

I guess the next step in integration with the .NET XML infrastructure would be to implement IXPathNavigable, or to accept an IXPathNavigable as input. That's certainly feasible, I don't know how valuable it would be.

    (1-3/3)

    Please register to reply