Project

Profile

Help

Difficulty with resolve-uri() with Saxon

Added by Anonymous over 15 years ago

Legacy ID: #7407359 Legacy Poster: Lea Hayes (numberkruncher)

Hi, I have an unusual problem with resolve-uri(), and I was not sure if this was a bug with Saxon, or an error in my usage. The following line: <xsl:value-of select="resolve-uri('test.xml','file://C:\test\something')"/> renders the following, as expected: file:///C:/test/test.xml However, the following line: <xsl:value-of select="resolve-uri('test.xml','file://C:\test%20test\something')"/> causes the following error: Base URI {file://C:\test%20test\somethin...} is invalid: Illegal character in path at index 15: file:///C:/test test/test.xml Is this a problem with Saxon? or am I doing something wrong? Many thanks, Lea Hayes


Replies (10)

Please register to reply

RE: Difficulty with resolve-uri() with Saxon - Added by Anonymous over 15 years ago

Legacy ID: #7407365 Legacy Poster: Michael Kay (mhkay)

Is this on Java or .NET? Saxon in both cases uses the URI library of the underlying platform. I'm surprised it works with backslashes - though .NET in particular is very tolerant of things that aren't legal according to the RFC. But the combination of backslash and %-escaping is asking a bit much. Legal URIs only use forwards slash as a path separator.

RE: Difficulty with resolve-uri() with Saxon - Added by Anonymous over 15 years ago

Legacy ID: #7407379 Legacy Poster: Lea Hayes (numberkruncher)

I am using the .NET library, but this problem also seems to occur within the <oxygen/> XML editor. In my actual transform the troublesome URI is not specified in the XML or the XSL. It is the default base URI: <xsl:value-of select="resolve-uri('test.xml',base-uri(.))"/> The default base URI uses forward slashes, but it is using %20 for spaces. Forward/backward slashes do not appear to make a lot of difference in my previous test. The following URI works, but this does not help me because I need to use the base-uri(.) function: <xsl:value-of select="resolve-uri('test.xml','file://C:/testte~1/something')"/>

RE: Difficulty with resolve-uri() with Saxon - Added by Anonymous over 15 years ago

Legacy ID: #7407430 Legacy Poster: Michael Kay (mhkay)

How do you invoke the transformation? The value returned by base-uri(.) comes from somewhere - it might be set explicitly via the API or be obtained from a filename, etc.

RE: Difficulty with resolve-uri() with Saxon - Added by Anonymous over 15 years ago

Legacy ID: #7407474 Legacy Poster: Lea Hayes (numberkruncher)

The base-uri function is used in two different context's: #1 - From the input document which is transformed using the API: #2 - From within the context of another document which is opened using the doc() function on xlink:href attributes. Here is the code that I am using to invoke the API: XdmNode input = processor.NewDocumentBuilder().Build(new Uri(sourceUri)); XsltTransformer transformer = processor.NewXsltCompiler().Compile(new Uri(xsltUri)).Load(); transformer.InitialContextNode = input; transformer.BaseOutputUri = new Uri(sourceUri); Is URI escaping completely incompatible with the resolve-uri function? The following quote suggests that XSLT should support this (http://www.xsltfunctions.com/xsl/fn_doc.html): "If you are accessing documents on a file system, your implementation may require you to precede the file name with file:///, use forward slashes to separate directory names, and escape each space in the file name with %20."

RE: Difficulty with resolve-uri() with Saxon - Added by Anonymous over 15 years ago

Legacy ID: #7407486 Legacy Poster: Michael Kay (mhkay)

So what's the value of sourceUri and xsltUri respectively? I think that resolve-uri() should accept a URI that has been percent-encoded. If it isn't accepting it, then I need to investigate why. It's currently doing it via a call on XmlUrlResolver.ResolverUri(). There's a comment in the code that suggests there's no really good reason for doing it differently on the .NET and Java platforms - on Java it's done using the resolve() method of class java.net.URI.

RE: Difficulty with resolve-uri() with Saxon - Added by Anonymous over 15 years ago

Legacy ID: #7407510 Legacy Poster: Lea Hayes (numberkruncher)

String sourceUri = "C:\Users\Administrator\Documents\doc\test.xml"; String xsltUri = "C:\Users\Administrator\Documents\doc\test.xsl"; If the following line is added: Uri testSourceUri = new Uri(sourceUri); Then in debug mode, the following is true: testSourceUri.AbsoluteUri == "file:///C:/Users/Administrator/Documents/doc/test.xml" If I set sourceUri to another test document: String sourceUri = "C:\Users\Administrator\Documents\doc\another folder\test.xml"; Then in debug mode, the following is true: testSourceUri.AbsoluteUri == "file:///C:/Users/Administrator/Documents/doc/another%20folder/test.xml" But, the error is reported even where the base URI is explicitly specified: resolve-uri('test.xml','file://C:/test%20test/something')

RE: Difficulty with resolve-uri() with Saxon - Added by Anonymous over 15 years ago

Legacy ID: #7439723 Legacy Poster: Lea Hayes (numberkruncher)

I understand that you are very busy. I was wondering whether you had found the source of this issue. Given the following transform: <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" version="2.0"> <xsl:template match="/"> [<xsl:value-of select="resolve-uri('test.xml','file:/c:/test folder with spaces/')"/>] </xsl:template> </xsl:transform> Saxon generates the following error from the .NET API: Base URI {file:/c:/test folder with spac...} is invalid: Illegal character in path at index 15: file:///c:/test folder with spaces/test.xml But, Saxon generates the expected output from within the <oxygen/> XML Editor: [file:/c:/test%20folder%20with%20spaces/test.xml] From my understanding, <oxygen/> uses the Java version of the API. So like you previously thought, perhaps this issue is constrained to the .NET version of the Saxon API. I don't know if this is of any help. Many thanks, Lea Hayes

RE: Difficulty with resolve-uri() with Saxon - Added by Anonymous over 15 years ago

Legacy ID: #7447841 Legacy Poster: Michael Kay (mhkay)

Sorry about the delay in responding to this. It turns out that the difference in behaviour is due to something Saxon does, not to the underlying platform: although Saxon uses different library routines on the two platforms, it escapes spaces as %20 before calling the relevant method on the Java platform, and fails to do the same on .NET. The code for Java has the rather unsatisfactory comment: // It's not entirely clear why we have to escape spaces by hand, and not other special characters; // it's just that tests with a variety of filenames show that this approach seems to work. The specification itself is a little unhelpful here, even as amended in erratum FO.E1: http://www.w3.org/XML/2007/qt-errata/xpath-functions-errata.html#E1 The relevant rule is: If $base is not a valid URI according to the rules of the xs:anyURI data type, if it is not a suitable URI to use as input to the chosen resolution algorithm (for example, if it is a relative URI reference, if it is a non-hierarchic URI, or if it contains a fragment identifier), then an error is raised [err:FORG0002]. I think there's a missing "or" after the first comma. Now, your input is what I call a "wannabe-URI": a string that becomes a valid URI after special characters are escaped. As such, it's a valid URI according to the rules of the xs:anyURI data type, but it is not a valid URI according to the RFCs that define the URI resolution algorithm; and there appears to be no license in the spec to do what Saxon on Java is doing, namely escaping the URI to make it valid. So, regretfully, I think Saxon on Java has it wrong, and it's right to throw an error on .NET. You should be escaping the URI before attempting to resolve it, using the iri-to-uri() function. Surprisingly, however, if I try this, I get a new problem: FORG0002: Base URI {file:/c:/test%20dir/} is invalid: Illegal character in path at index 15: file:///c:/test dir/test.xml This seems to be because Saxon is taking the .NET System.Uri returned by the XmlUrlResolver, applying ToString() on it, and then passing the result to the Java java.net.URI constructor; it appears that the ToString() method unescapes the %20, making a wannabe-URI that is unacceptable to the Java constructor. So the .NET code is wrong too. I'm going to experiment with using the same (Java) code on both platforms. I suspect the reason it diverged was due to bugs in the GNU ClassPath library that are no longer present.

RE: Difficulty with resolve-uri() with Saxon - Added by Anonymous over 15 years ago

Legacy ID: #7448005 Legacy Poster: Lea Hayes (numberkruncher)

> Sorry about the delay in responding to this. No problem. > This seems to be because Saxon is taking the .NET System.Uri returned by the XmlUrlResolver, applying ToString() on it, and then passing the result to the Java java.net.URI constructor; it appears that the ToString() method unescapes the %20, making a wannabe-URI that is unacceptable to the Java constructor. So the .NET code is wrong too. I don't understand the internal workings of Saxon; but would simply switching from "theUri.ToString()" to "theUri.AbsoluteUri" solve the problem? This version of the URI maintains the escaping.

RE: Difficulty with resolve-uri() with Saxon - Added by Anonymous over 15 years ago

Legacy ID: #7448098 Legacy Poster: Michael Kay (mhkay)

Thanks for the suggestion. That may be a less risky fix for a maintenance release.

    (1-10/10)

    Please register to reply