Project

Profile

Help

How to obtain all namespaces with prefixes from XML

Added by Michael Staal-Olsen almost 4 years ago

Let's say I have an XML document represented as an XdmNode object. Is their a straightforward or canonical way of taking the XdmNode and generating the mapping of all prefixes to namespaces introduced in the XML structure? My motivation is the following: I have some XML structure, and I want to be able to do XPath queries in this structure using the namespaces (and their prefixes) specified in the document dynamically (without altering my application). If Saxon can do this directly with an XPathCompiler, I would prefer this, but it seems like I have to declare the namespaces manually to the compiler.


Replies (3)

Please register to reply

RE: How to obtain all namespaces with prefixes from XML - Added by Michael Kay almost 4 years ago

The complication with this is how to handle the case where the same prefix is bound to different namespaces at different points in the document. That's one of those annoying situations where an edge case that hardly ever occurs in practice can fundamentally affect the design: and it's this edge case that makes it difficult to offer a simple solution at the API level.

The other complication is doing it efficiently: the XDM exposes the in-scope namespace map for each element in the document, and you therefore (at least in theory) have to merge all these maps. So if you've only got two namespace declarations, and they're both on the outermost element, you still have to scan 100,000 elements, build 100,000 identical namespace maps, and merge them. This encourages use of short-cuts that are specific to the tree implementation model: for example (in 10.0) the TinyTree has a method getNamespaceMaps() which returns all distinct namespace maps found in the document (without duplicates): in the example above there would be exactly one namespace map returned, which is the one you want.

If you only want to handle prefixes that are declared in the outermost element of the document, it becomes much simpler. You can then simply do

for (NamespaceBinding binding: root.getAllNamespaces()) {
  compiler.declareNamespace(binding.getPrefix(), binding.getURI())
}

(You may also need to handle the default namespace differently, depending on your requirements)

RE: How to obtain all namespaces with prefixes from XML - Added by Michael Staal-Olsen almost 4 years ago

Hi Michael!

Thank you so much for yet another great and thorough answer. And thank you in particular for not only giving an answer, but also adding to details as to why my question was in fact premature: You are indeed right that a "canonical" solution cannot exist due to the possibility of the same prefix appearing in different contexts within the same document. I think for my purpose the right thing will either be to use your simple approach (and collect from the outermost element) or hope that I can soon upgrade to Saxon 10 and obtain the new ability to query seemlessly without namespaces. My own current approach was do modify the compiler via this method:

    private void declareNamespaces(XPathCompiler compiler, XdmNode node) {
        XdmSequenceIterator<XdmNode> children = node.axisIterator(Axis.CHILD)
        while (children.hasNext()) {
            XdmNode child = children.next()
            XdmSequenceIterator<XdmNode> iterator = child.axisIterator(Axis.NAMESPACE)
            while (iterator.hasNext()) {
                XdmNode xdm = iterator.next()
                compiler.declareNamespace(xdm.nodeName.toString(), xdm.stringValue.toString())
            }
        }
    }

Another thing: Is there a way to do XPath queries using namespaces without ever referring to a prefix? So for instance, if I have the namespace given by the (prefix, namespace) = (pr, test:ns:space) pair, can I then query XML by explicitly writing 'test:ns:space' in my query? And what is the syntax?

RE: How to obtain all namespaces with prefixes from XML - Added by Michael Kay almost 4 years ago

XPath 3.1 introduces the syntax Q{uri}local, so (especially for software-generated paths) you can write for example

/Q{http://www.w3.org/2001/XMLSchema}schema/Q{http://www.w3.org/2001/XMLSchema}element

to find all the top level element declarations in a schema document.

    (1-3/3)

    Please register to reply