resolve-uri() function also normalises consecutive spaces
Hi, Not sure if this is a bug or not. I have an application where I am making URLs from paths in the file system, using the resolve-uri() function. The issue is if a filename contains two consecutive spaces, the resolve-uri() function collapses these to a single space, and so the result URL does not work. I could not see anything in the documentation for this function that suggests this should happen. I have tested this using XSLT bundled with Oxygen XML 21, which is Saxon-PE 126.96.36.199 but have seen it in other versions too. I have attached a sample XSL file that demonstrates the problem. Note that the problem goes away if the file name is % encoded first, so maybe my issue comes from the fact that the space character isn't valid in the URI anyway.
#1 Updated by Michael Kay 9 months ago
I've reproduced this effect. Saxon is carefully escaping the spaces as %20 so that the Java JDK URI class doesn't barf on the URIs, and is then unescaping each %20 back to a space. The final step is to turn the resulting string into an instance of
xs:anyURI, and it's at this stage that multiple spaces are being collapsed: the
xs:anyURI type in XSD has the facet whitespace=collapse which means that the value space does not allow strings with consecutive spaces.
(XSD 1.1 part 2 is actually a bit inconsistent on this. It claims that the value space allows all sequences of characters, but by imposing the whiteSpace=collapse facet, it effectively constrains the value space. I've heard some XSD gurus say this is OK, the value space can include "ineffable" values that have no string representation; but this is hardly practical).
Given the fact that resolve-uri() is defined to return an xs:anyURI and that casting a string to xs:anyURI collapses whitespace, I don't think we can treat this as a bug, however inconvenient it might be.
Please register to edit this issue