Bug #2314
closedStylesheet Base Uri with spaces causes issues when compiling stylesheet [Saxon.HE 9.6N]
100%
Description
Apparently another change in behavior that looks a bit like a regression when switching from 9.5 to 9.6.
We frequently do in-memory transformations where both the input document and the XSLT are not actual files but strings/byte-buffers/XML fragments.
However, to run XsltCompiler.Compile
correctly with a XmlReader
(for example), it must have XsltCompiler.BaseUri
set to a valid Uri. The path pointed to by the Uri need not exist, and it does not make a difference what I set it to since all the XSLTs we run that way are standalone, without any includes.
Therefore, whenever we have to set that path to something, we simply use the assembly path - since its not used anyways.
With 9.5, this worked just fine when the application is installed below C:\Program Files (with the space in there).
However, 9.6 throws an XPathException:
HResult=-2146233088
Message=Invalid URI for stylesheet: file:///c:/path with space/nonexistant.xsl
Source=saxon9he
StackTrace:
at net.sf.saxon.style.StylesheetModule.loadStylesheetModule(Source styleSource, Boolean topLevelModule, Compilation compilation, NestedIntegerValue precedence)
at net.sf.saxon.style.Compilation.compilePackage(Source source)
at net.sf.saxon.style.Compilation.compileSingletonPackage(Configuration config, CompilerInfo compilerInfo, Source source)
at Saxon.Api.XsltCompiler.Compile(XmlReader reader)
at ConsoleApplication1.Program.Main(String[] args) in c:\Users\emanuel.wlaschitz\AppData\Local\Temporary Projects\ConsoleApplication1\Program.cs:line 18
at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
InnerException:
Sample code:
static void Main(string[] args)
{
Uri stylesheetBaseUri = new Uri(@"c:\path with space\nonexistant.xsl");
var xsl = XDocument.Parse(@"<xsl:stylesheet version='2.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match='/'><foo/></xsl:template>
</xsl:stylesheet>");
var compiler = new Processor(false).NewXsltCompiler();
compiler.BaseUri = stylesheetBaseUri;
//throws an exception when:
//- compiler.BaseUri is null
//- compiler.BaseUri contains a space (although the Uri instance correctly escape the space to %20)
compiler.Compile(xsl.CreateReader());
}
This is on 9.6.0.4 running on .NET 4.5, in case it matters.
Updated by Michael Kay almost 10 years ago
- Priority changed from Low to Normal
My immediate reaction would be that if 9.5 failed to detect an invalid URI, and 9.6 detects it, then that's surely a bug in 9.5, not a regression in 9.6?
Updated by Emanuel Wlaschitz almost 10 years ago
Even when 9.5 is the bugged one; 9.6 seems to handle it incorrectly aswell. The Uri instance does show an escaped AbsoluteUri
value of file:///c:/path%20with%20space/nonexistant.xsl
(or c:/path%20with%20space/nonexistant.xsl
for @AbsolutePath@).
But ok, I just checked and tried @new Uri(@@@"c:\path%20with%20space\nonexistant.xsl")@, which technically works (but shows incorrect values for @AbsoluteUri@: @file:///c:/path%2520with%2520space/nonexistant.xsl@).
As far as I could see, you seem to use Uri.ToString()
inside @XsltCompiler.Compile@. In this case, it apparently returns the unescaped version as was passed to the ctor (from @Uri.OriginalString@).
But then again, this also happened in 9.5; so I'm not exactly sure what changed to cause this.
Updated by Emanuel Wlaschitz over 9 years ago
According to MSDN - Uri.ToString":https://msdn.microsoft.com/en-us/library/system.uri.tostring.aspx this returns the canonical representation (which includes spaces), not an encoded one. Also, "MSDN - Uri Class lists this in the remarks section: ??You can transform the contents of the Uri class from an escape encoded URI reference to a readable URI reference by using the ToString method.??
It might work to call MSDN - Uri.EscapeUriString on the result of uri.ToString@. I'd guess this change in @XsltCompiler.Compile(XmlReader)
(@Xslt.cs@ line 537) could fix this:
But since I cannot manage to compile Saxon-HE from source, I can't really try this out...
Updated by O'Neil Delpratt over 9 years ago
- Category set to .NET API
- Status changed from New to In Progress
- Assignee set to O'Neil Delpratt
- Found in version set to 9.6
Thanks you for sending us you suggested fix to this problem. Just letting you know that we are currently investigating it.
Updated by O'Neil Delpratt over 9 years ago
- Category changed from .NET API to Internals
- Found in version changed from 9.6 to 9.5 9.6
update in my investigation:
It has turned out that this is in fact a Java problem. You will get a failure if you either escape the path (point raised in comment #3) or leave it unescaped. We managed to create junit test case which reproduces the problem and we will investigate further how we can resolve it.
This bug applies to 9.5 as well as in 9.6.
Updated by O'Neil Delpratt over 9 years ago
- Category changed from Internals to .NET API
Update:
Having done some further investigation on this problem I have realised that what was said in comment #5 is incorrect. This problem is actually specific to .NET as the API is designed different in how the source file is provided. The use of the XmlReader is not present in Java. Specifically in Saxon .NET the compile method looks for the baseURI on the XmlReader. If this is not set then we look for the baseURI on the compiler.
In addition, I have noticed differences between .NET and Java in what is a valid URI from that which is not. Adding the escaping solution mentioned in commend #3 might fix this problem but might cause problems in other cases.
Updated by Emanuel Wlaschitz over 9 years ago
Still, it's a bit odd that it suddenly stopped working after going from 9.5 to 9.6 - so something in there somewhere must have changed (IKVM changed for example, but I do hope that they didn't change behavior breaking everything).
I just tried this out, the code also fails when I pass a XmlReader
that reads a file containing spaces in the path; aswell as when using the Uri
Overload of @XsltCompiler.Compile@.
As far as I could see, the .NET Uri class may accept non-wellformed Uris and just make them; which is what seems to happen in my case. IsWellFormedOriginalString returns false
for a Path that contains spaces that aren't escaped.
However, even if I pass a correctly escaped one (both IsWellFormedOriginalString
and the static IsWellFormedUriString return true@), the result is still the same - since @Uri.ToString
returns a readable instance, not a well-formed one.
Technically, .NET and Java do not seem to have differences on what a valid Uri is or not; but rather on how to deal with Uris that are not actually valid (.NET tries to make them valid by escaping some parts, Java throws if things do not validate).
So either way, I'm somewhat convinced that the line in XsltCompiler.Compile
that takes BaseUri.ToString
has to change as it looks like incorrect use of the API to me...but as you said, this might be a bigger task if it happens all over the place and now surfaces here.
I'm currently trying out a workaround to do exactly that from my code by deriving from Uri
and overriding the ToString
method - but longer term this does not seem to be like a useful fix for this either.
Updated by Michael Kay over 9 years ago
In investigating bug 2345, I found that what's changed between 9.5 and 9.6 is probably that the code for evaluating use-when expressions is now more rigorous in the way it computes the base URI for these expressions; as a result, the software is less tolerant of invalid URIs than it used to be.
Updated by Emanuel Wlaschitz over 9 years ago
Gave 9.6.0.7 a shot, no change so far. But I figured I might share my current workaround here (which I thought I already did earlier):
public sealed class SaxonUri : Uri
{
public SaxonUri(Uri wrappedUri)
: base(GetUriString(wrappedUri), GetUriKind(wrappedUri))
{
}
private static string GetUriString(Uri wrappedUri)
{
if (wrappedUri == null)
throw new ArgumentNullException("wrappedUri", "wrappedUri is null.");
if (wrappedUri.IsAbsoluteUri)
return wrappedUri.AbsoluteUri;
return wrappedUri.OriginalString;
}
private static UriKind GetUriKind(Uri wrappedUri)
{
if (wrappedUri == null)
throw new ArgumentNullException("wrappedUri", "wrappedUri is null.");
if (wrappedUri.IsAbsoluteUri)
return UriKind.Absolute;
return UriKind.Relative;
}
public override string ToString()
{
if (IsWellFormedOriginalString())
return OriginalString;
else if (IsAbsoluteUri)
return AbsoluteUri;
return base.ToString();
}
}
Simply wrap the URI being passed to compiler.BaseUri
in that to get the modified ToString()
behavior.
In case the originally passed URI string was well-formed already, we simply return that one as result from ToString()@; otherwise we try falling back to the absolute URI this particular URI-Instance produces. Absolute fallback is the same behavior as before; returning @base.ToString()
which may trigger this issue again.
We had no failing reports with this so far, but YMMV.
Updated by Michael Kay over 9 years ago
I'll take another look. We're gradually making progress on this front: the W3C specs are gradually becoming more precise and we are getting more test cases. (But in the interests of precision, let's be clear: if it contains a space then it is not a URI. The fashionable term for things that aren't URIs but which many interfaces accept in lieu of a URI is a LEIRI - legacy internationalized resource identifier).
Updated by Emanuel Wlaschitz over 9 years ago
Michael Kay wrote:
(But in the interests of precision, let's be clear: if it contains a space then it is not a URI. The fashionable term for things that aren't URIs but which many interfaces accept in lieu of a URI is a LEIRI - legacy internationalized resource identifier).
Agreed, but I believe thats a point where .NET and Java handle things differently. If I pass in "path with spaces"
to the .NET Uri
ctor, the Uri
class makes it "path%20with%20spaces"
- which I do think makes it a valid URI. Doing so is most likely for convinience reasons, other than anything else.
Java, as far as I could see, requires me to do this myself before creating an @URI@-Instance. But then again, I haven't really done any Java code in the past 10 years, so things might be different now.
Plus, the aforementioned distinction in .NET between "something readable" (returned by @ToString()@) vs. "something valid" (returned by the various properties, depending on whether the original string was valid, the Uri is absolute/relative, and possibly other factors) which still breaks this, even when I explicitly pass in something I'd consider a valid URI (and not an IRI, LEIRI, IDN or whatever else).
Updated by Michael Kay almost 9 years ago
- Status changed from In Progress to Resolved
- Assignee changed from O'Neil Delpratt to Michael Kay
Sorry we dropped the ball on this one.
I have been running some tests and changing
baseu = baseUri.ToString()
to
baseu = Uri.EscapeUriString(baseUri.ToString())
in XsltExecutable.Compile
appears to work. It's difficult to be confident that the change won't have side-effects, or that it is a complete solution, but it seems to be an improvement.
Applying this as a patch on the 9.6 and 9.7 branches.
Updated by Emanuel Wlaschitz almost 9 years ago
Thanks for the follow-up, Mike! We're still waiting for the .NET release of 9.7 to be available so we can try this out.
Updated by O'Neil Delpratt almost 9 years ago
- % Done changed from 0 to 100
Bug fix applied in the Saxon 9.6.0.8 maintenance release.
Leaving it marked as resolved until fix applied in the Saxon 9.7 release.
Updated by O'Neil Delpratt over 8 years ago
- Applies to branch 9.6 added
- Fix Committed on Branch 9.6, 9.7 added
- Fixed in Maintenance Release 9.6.0.8 added
Updated by O'Neil Delpratt over 8 years ago
- Status changed from Resolved to Closed
- Fixed in Maintenance Release 9.7.0.4 added
- Fixed in Maintenance Release deleted (
9.6.0.8)
Bug fix applied in the Saxon 9.7.0.4 maintenance release.
Please register to edit this issue