Bug #4280
closedFailure in file:base-dir(): "URI has an authority component"
100%
Description
There is an issue when calling:
var transformer = _xsltCompiler.Compile(new Uri(stylesheetPath)).Load();
These are the following lines I have that's dealing with the file system.
<xsl:function name="pn:find-image-filepath">
<xsl:param name="filename"/>
<xsl:variable name="file-path"
select="file:list(file:parent(file:base-dir()), true(), concat($filename, '*'))[1]"/>
<xsl:choose>
<xsl:when test="$file-path">
<xsl:value-of select="file:path-to-uri(concat(file:parent(file:base-dir()), $file-path)))"/>
</xsl:when>
<xsl:otherwise>
<xsl:message terminate="no">Could not find image file "<xsl:value-of select="$filename"/>"
</xsl:message>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
Googling this issue states it's an issue when on Windows machines. When working with files I should prefix the file path with: file:/// with 3 slashes. When I look at the path (when processing this via the commandline and not .NET) the path is rendered as follows: "file:/C:/data/data/..." However, is there a way to tell omit or include he extra slashes without doing a string replace?
Files
Related issues
Updated by Kevon Hayes over 5 years ago
I'm beginning to think this also is tied into the licensing issue because I ran the following from the commandline:
<xsl:value-of select="replace(file:path-to-uri(concat(file:parent(file:base-dir()), $file-path)), 'file:/', 'file:///')"/>
and Saxon was able to parse with no issues however from .NET I get the following exception: Message: URI has an authority component Exception: " at java.io.File..ctor(URI uri)\r\n at com.saxonica.functions.extfn.EXPathFile.toFile(String )\r\n at com.saxonica.functions.extfn.EXPathFile._parent(String path)\r\n at com.saxonica.functions.extfn.EXPathFileFunctionSet.BaseDir.makeResult(String )\r\n at com.saxonica.functions.extfn.EXPathFileFunctionSet.BaseDir.makeFunctionCall(Expression[] arguments)\r\n at net.sf.saxon.functions.registry.BuiltInFunctionSet.bind(F symbolicName, Expression[] staticArgs, StaticContext env, List reasons)\r\n at net.sf.saxon.functions.FunctionLibraryList.bind(F functionName, Expression[] staticArgs, StaticContext env, List reasons)\r\n at net.sf.saxon.functions.FunctionLibraryList.bind(F functionName, Expression[] staticArgs, StaticContext env, List reasons)\r\n at net.sf.saxon.expr.parser.XPathParser.parseFunctionCall(Expression prefixArgument)\r\n at net.sf.saxon.expr.parser.XPathParser.parseBasicStep(Boolean firstInPattern)\r\n at net.sf.saxon.expr.parser.XPathParser.parseStepExpression(Boolean firstInPattern)\r\n at net.sf.saxon.expr.parser.XPathParser.parseRelativePath()\r\n at net.sf.saxon.expr.parser.XPathParser.parsePathExpression()\r\n at net.sf.saxon.expr.parser.XPathParser.parseSimpleMappingExpression()\r\n at net.sf.saxon.expr.parser.XPathParser.parseUnaryExpression()\r\n at net.sf.saxon.expr.parser.XPathParser.parseExprSingle()\r\n at net.sf.saxon.expr.parser.XPathParser.parseFunctionArgument()\r\n at net.sf.saxon.expr.parser.XPathParser.parseFunctionCall(Expression prefixArgument)\r\n at net.sf.saxon.expr.parser.XPathParser.parseBasicStep(Boolean firstInPattern)\r\n at net.sf.saxon.expr.parser.XPathParser.parseStepExpression(Boolean firstInPattern)\r\n at net.sf.saxon.expr.parser.XPathParser.parseRelativePath()\r\n at net.sf.saxon.expr.parser.XPathParser.parsePathExpression()\r\n at net.sf.saxon.expr.parser.XPathParser.parseSimpleMappingExpression()\r\n at net.sf.saxon.expr.parser.XPathParser.parseUnaryExpression()\r\n at net.sf.saxon.expr.parser.XPathParser.parseExprSingle()\r\n at net.sf.saxon.expr.parser.XPathParser.parseFunctionArgument()\r\n at net.sf.saxon.expr.parser.XPathParser.parseFunctionCall(Expression prefixArgument)\r\n at net.sf.saxon.expr.parser.XPathParser.parseBasicStep(Boolean firstInPattern)\r\n at net.sf.saxon.expr.parser.XPathParser.parseStepExpression(Boolean firstInPattern)\r\n at net.sf.saxon.expr.parser.XPathParser.parseRelativePath()\r\n at net.sf.saxon.expr.parser.XPathParser.parsePathExpression()\r\n at net.sf.saxon.expr.parser.XPathParser.parseSimpleMappingExpression()\r\n at net.sf.saxon.expr.parser.XPathParser.parseUnaryExpression()\r\n at net.sf.saxon.expr.parser.XPathParser.parseExprSingle()\r\n at net.sf.saxon.expr.parser.XPathParser.parseExpression()\r\n at net.sf.saxon.expr.parser.XPathParser.parse(String expression, Int32 start, Int32 terminator, StaticContext env)\r\n at net.sf.saxon.expr.parser.ExpressionTool.make(String expression, StaticContext env, Int32 start, Int32 terminator, CodeInjector codeInjector)\r\n at net.sf.saxon.style.StyleElement.makeExpression(String expression, Int32 attIndex)\r\n at net.sf.saxon.style.SourceBinding.prepareAttributes(Int32 permittedAttributes)\r\n at net.sf.saxon.style.XSLLocalVariable.prepareAttributes()\r\n at net.sf.saxon.style.StyleElement.processAttributes()\r\n at net.sf.saxon.style.StyleElement.processAllAttributes()\r\n at net.sf.saxon.style.StyleElement.lambda$processAllAttributes$1(NodeInfo )\r\n at net.sf.saxon.style.StyleElement.__<>Anon1.accept(Item )\r\n at net.sf.saxon.om.SequenceIterator.forEachOrFail(SequenceIterator , ItemConsumer )\r\n at net.sf.saxon.style.StyleElement.processAllAttributes()\r\n at net.sf.saxon.style.PrincipalStylesheetModule.processAllAttributes()\r\n at net.sf.saxon.style.PrincipalStylesheetModule.preprocess()\r\n at net.sf.saxon.style.Compilation.compilePackage(Source source)\r\n at net.sf.saxon.style.StylesheetModule.loadStylesheet(Source styleSource, Compilation compilation)\r\n at net.sf.saxon.style.Compilation.compileSingletonPackage(Configuration config, CompilerInfo compilerInfo, Source source)\r\n at net.sf.saxon.s9api.XsltCompiler.compile(Source source)\r\n at Saxon.Api.XsltCompiler.Compile(Stream input, String theBaseUri, Boolean closeStream)\r\n at Saxon.Api.XsltCompiler.Compile(Uri uri)\r\n at Comply365.Business.Objects.XAT.Export.XslFoTransformer.Transform(String inputPath, String stylesheetPath, String outputPath)"
Updated by Kevon Hayes over 5 years ago
This is the XML from the above that ran with no issues via Saxon commandline:
<xsl:value-of select="replace(file:path-to-uri(concat(file:parent(file:base-dir()), $file-path)), 'file:/', 'file:///')"/>
Updated by Michael Kay over 5 years ago
I'm a little confused about exactly what the problem is.
Your problem description says "there is an issue" but it never actually says what the issue is. It's presumably the error message in the title of your post. But it's not clear what you are doing when you get this error message. You imply that you get the error when compiling the stylesheet, but unfortunately you don't tell us what's in the variable stylesheetPath
.
The stack trace shows the error as occurring while compiling an XPath expression containing a call to file:base-dir(), which is evaluated statically (because the base URI is part of the static context), and the failure appears to be during some kind of normalization of the file name which involves converting the supplied string to a URI and then to a File object.
So I think the key bit of information that's missing is: what is the value of stylesheetPath
?
Unfortunately the file:
URI scheme is not particularly well standardized, and we know that Java and .NET have differences in interpretation of the rules. In this case we are using both the .NET Uri class (when you initially supply a URI to the compile() method), and the Java URI class (when we normalize/resolve the static base URI to return from file:base-dir()). It might well be that it's these differences of interpretation that are causing the problem.
As far as possible, we avoid manipulating filenames and URIs ourselves in Saxon, but rely on Java primitives to do it. There are a few exceptions, for example the EXPath file code sometimes treats backslashes in filenames as forwards slashes where the Java libraries would otherwise complain.
Updated by Michael Kay over 5 years ago
- Subject changed from URI has an authority component to Failure in file:base-dir(): "URI has an authority component"
- Category set to Saxon extensions
Updated by Michael Kay over 5 years ago
Note, many of the threads discussing this error seem to resolve to some issue with UNC filenames, which are sometimes (incorrectly) converted to invalid URIs in which "file:" is followed by either two slashes or four. In a legal "file:" URI, there must be either one slash or three following the "file:" prefix. There appears to be no correct or universally-accepted way of representing UNC filenames as URIs.
Updated by Kevon Hayes over 5 years ago
Thanks Michael,
The issue is that the _xsltCompiler throws the above exception when attempting to compile source XML into a an FO for PDF conversion. Yet when I do the same conversion via the Saxon CLI using the following it parses the file without error.
C:\KH...>Transform -t -s:..\XML_NOC.xml **-xsl:**QRH.xsl -o:FOFile.fo
To answer your question: The value of the stylesheet value is: "\\somefilefolder\folder000\folder00\folder0\Temp\folder\folder\XML_NOC.xml" of which I attempted to prefix with file:
/, file:///
, file:
\ and file:\\\
When using "file:\"
I still get the error: "URI has an authority component"
When using "file:\\\"
***I get "Could not find a part of the path '**c:*'" (so it thinks I'm referring to the C: drive for whatever reason )
Updated by Kevon Hayes over 5 years ago
No sure why the bold escaped one backslash but when trying: 1 slash: file:\ I get the URI has an authority component error 3 slashes file:\\ I get "Could not find a part of the path 'c:\somefilefolder\folder000\folder00\folder0\Temp\folder\folder\XML_NOC.xml which of course does not exist.
Updated by Michael Kay over 5 years ago
If it starts with two slashes (or backslashes) then it's a UNC filename so we get into that issue that UNC filenames can't be accurately represented as URIs.
The XsltCompiler.Compile() method expects a URI, and I suspect (need a Windows machine to check...) that when you call new Uri(stylesheetPath))
it's giving you a .NET Uri
object that doesn't actually correspond to a valid W3C URI value, which is why Saxon subsequently has difficulty with it. It might be worth checking what this Uri
object actually looks like.
The .NET spec for the Uri class says:
Uri can also be used to represent local file system paths. These paths can be represented explicitly in URIs that begin with the file:// scheme, and implicitly in URIs that do not have the file:// scheme... These implicit file paths are not compliant with the URI specification and so should be avoided when possible.
So I strongly suspect you've passed a Uri that isn't compliant with the URI specification and this is why Java (and hence Saxon) have difficulty with it.
A couple of other points:
-
URIs always use forwards slashes, not backslashes. So
file:\\\
is a nonsense -
It's best to avoid trying to convert filenames to URIs by simple prefixing. It may sometimes work, but you're very unlikely to get it completely right. For example special characters such as spaces and percent-signs in filenames need special treatment. Use .NET or Java methods to do the conversion, not string manipulation.
Updated by Kevon Hayes over 5 years ago
Thanks for that Michael,
I will look into it. I understand completely with string manipulation. I was troubleshooting attempting to see what would would not work.
Updated by Kevon Hayes over 5 years ago
- File UriFormat_001.png UriFormat_001.png added
Hello Michael,
I realize I never gave you the complete file path value of the stylesheet when I called the compile method as shown above. It is: file://dev-pri-files1/Root1/PTPDocs/devrelease/Temp/c006fa2b-baef-48ce-8dc6-75c14c73e185/B747-8 QRH SB 20190621/xsl-fo/QRH.xsl
If you view: https://www.ietf.org/rfc/rfc1738.txt the file resource URI matches what I have.
According to the rfc spec the authority is correct and allowed. Also my stylesheet format is correct. So can you tell me why am I getting this error? Are you telling me I cannot load a stylesheet from a network resource? Are you saying that I must use a local file system? Also that article explains that the only time systems can use: "file:///" is when you are on localhost. You can omit localhost and just use the third slash.
Please provide an example or advise procedure of how this should work for us on .NET when trying to compile a stylesheet that's located on a network resource. The cs samples doesn't go into the needed detail.
C# example:
var transformer = _xsltCompiler.Compile(new Uri(stylesheetPath)).Load();
// stylesheetPath is **file://dev-pri-files1/Root1/PTPDocs/devrelease/Temp/c006fa2b-baef-48ce-8dc6-75c14c73e185/B747-8 QRH SB 20190621/xsl-fo/QRH.xsl**
Updated by Michael Kay over 5 years ago
Given a URI of the form file://dev-pri-files/x/y/z
, this matches the syntax given in the RFC. But it's worth reading the rest of the section:
A file URL takes the form:
file://<host>/<path>
where <host> is the fully qualified domain name of the system on
which the <path> is accessible, and <path> is a hierarchical
directory path of the form <directory>/<directory>/.../<name>.
...
As a special case, <host> can be the string "localhost" or the empty
string; this is interpreted as `the machine from which the URL is
being interpreted'.
The file URL scheme is unusual in that it does not specify an
Internet protocol or access method for such files; as such, its
utility in network protocols between hosts is limited.
So what you have is a syntactically-valid URI, but (a) the value of <host>
is not a "fully qualified domain name", and (b) even if it were, Java wouldn't know what to do with it, because it hasn't been told whether the file is accessible using http, ftp, webdav, Microsoft SMB, or something else. The phrase "its utility in network protocols between hosts is limited" is a polite way of saying "in general, the file: URI scheme isn't useful for accessing remote files on a different machine".
There is in fact a more recent (2017) RFC on the file URI scheme: https://tools.ietf.org/html/rfc8089 (which I hadn't come across until today). Section 3 says:
A file URI can be dependably dereferenced or translated to a local
file path only if it is local. A file URI is considered "local" if
it has no "file-auth", or the "file-auth" is the special string
"localhost", or a fully qualified domain name that resolves to the
machine from which the URI is being interpreted (Section 2).
(file-auth
corresponds to <host>
in the RFC 1738 syntax).
Appendix E.3 discusses representation of UNC filenames under the general heading of "Non-standard syntax variations". To the best of my knowledge, Java doesn't implement or recognize this variation.
Worse still, there seems to be a different convention on Java (also not universally supported), which is to represent the entire UNC filename in the path component of the URI, that is: file:////dev-pri-files/x/y/z
(with four forwards-slashes).
Updated by Michael Kay over 5 years ago
So, what's the way forward?
Compiling a stylesheet from a remote machine isn't too difficult, for example you could deference the filename yourself and supply a Stream.
What's more difficult is supplying a base URI for that stylesheet that is can be used to locate other resources using relative references: for example in xsl:include and xsl:import, and in calls to the doc() function. The only way you can do this reliably is to access the remote resources over HTTP. Alternatively, you could try setting a custom XmlResolver on the XsltCompiler. However, the XmlResolver is only going to be used when getting XML resources, and the stylesheet base URI is also used in some cases for non-XML resources, of which your use of EXPath File extensions is one example.
From our point of view, I guess we could try enhancing Saxon on .NET (or perhaps even Saxon on Java on Windows...) to be able to dereference file URIs that use the non-standard extensions in RFC 8089 for referencing UNC files. That's a fairly substantial project, I would think.
Updated by Kevon Hayes over 5 years ago
Michael,
I went down the filestream route yesterday and indeed the issue was with the xsl:include dependencies not being found. So I quickly understood that if I streamed a stylesheet I would need to stream all of the dependencies used by that stylesheet. This is not idea since we reference dozens of files.
Updated by Michael Kay over 5 years ago
Indeed. I think it's probably going to be best if you try to access remote stylesheets using HTTP rather than UNC (Microsoft SMB). I think the whole XML ecosystem is built around using HTTP for doing this kind of thing.
Updated by Kevon Hayes over 5 years ago
Michael,
I'm still not clear on the course of action we are to following to get past this issue. Can you clear specify?
Updated by Michael Kay over 5 years ago
If you need to access stylesheet code over a network then I think you should implement an HTTP server that can deliver the stylesheet code in response to a request using an HTTP URI, rather than accessing it over Microsoft SMB using UNC filenames.
Updated by Vladimir Nesterovsky over 5 years ago
You may consider a couple of workarounds depending on your setup:
- Register and use a custom java.net.URL protocol.
- Map network path of a local disk.
Updated by Kevon Hayes over 5 years ago
- I'll try this
- Tried this already with no avail.
FYI. We are using SaxonPE in .NET solution.
Updated by Kevon Hayes over 5 years ago
Michael/Valdimir,
According to: https://stackoverflow.com/questions/18520972/converting-java-file-url-to-file-path-platform-independent-including-u it appears that Java erroneously reports an authority when parsing UNC paths. This is apparently fixed in Java 7 via the java.nio.Paths implementation.
Any plans to patch this so all your .NET paying customers could benefit from it?
Updated by Kevon Hayes over 5 years ago
More insight to eternally fix this for .NETers.
Updated by Kevon Hayes over 5 years ago
According to: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=5086147
On line 258 in your Saxon PE codebase of the XsltCompiler. 1 too many slashes are removed. This can be fixed if wrap the uri.ToString() call with java.nio.Paths.get().
I attached a screenshot of the offending line of code that causes the bad URI.
Updated by Michael Kay over 5 years ago
Thanks for the citations; I had already looked at many of these when responding earlier on this thread. What this generally reveals is that Java handling of UNC filenames is a mess. One of my concerns is that a number of the attempts to solve the problem do it in a way that is incompatible with the way that RFC 8089 tackles it; we also need to establish whether it's consistent with the way the Microsoft .NET Url class handles it.
Our ability to solve this is also restricted by the fact that there are a lot of third-party components involved, notably the XML parser and the OASIS catalog resolver (when used).
I've added a work item to our internal shopping list for future enhancements. It would be nice to offer improvements in this area and we will try to find room in the schedule for this work, but it's not a trivial bug-fix. It's also quite likely to get complicated by any changes we make over the coming year to take advantage of Microsoft's promised developments on .NET which appear to include some kind of improved Java interoperability, though the details are still very murky.
Updated by Kevon Hayes over 5 years ago
Valdimir,
Explain this approach:
- Register and use a custom java.net.URL protocol.
Updated by Vladimir Nesterovsky over 5 years ago
I think following StackOverflow post should lead you in right direction:
https://stackoverflow.com/questions/26363573/registering-and-using-a-custom-java-net-url-protocol
Updated by Michael Kay over 5 years ago
We are discussing internally what the best way forward is to support UNC filenames on Windows (whether running Saxon on Java or .NET). The recent RFC 8089 is helpful: it describes two "non-standard" ways of representing UNC filenames as URIs, and we should try and support either or both of these on critical interfaces. We suspect that some of the problems are due to .NET and Java disagreeing on which of the two representations to use, and we may have to do some mediation when we pass URIs between the two environments. We will need to do a significant amount of testing to get this working, and we can't guarantee that this can all be done in maintenance releases; some of it may need to wait until a major release.
The specific problems you have reported appear to be (and we need to confirm this) that .NET, when constructing a Uri from a UNC filename, uses the representation defined in E.3.1 of RFC 8089, and Java does not accept this form. Whether it fully accepts the alternative form described in E.3.2 remains to be seen. If this proves to be the case, then it might be possible to work around the problem by supplying (to the XsltCompiler.Compile
method, or in XsltCompiler.BaseUri
) a Uri in the format that Java expects: that is, one with no Authority component and with a path component of the form "//server/x/y/z/". It may be necessary to use the .NET UriBuilder
class to construct such a URI.
Updated by Michael Kay over 5 years ago
We're back in the office today and able to do some systematic experiments in a variety of environments.
We're running on Windows, set up with a stylesheet //server/unctest/test.xsl
that includes another stylesheet with <xsl:include href="test-inc.xsl"/>
.
This runs successfully on four environments: Java and .NET from the command line, Java and .NET via the Saxon transformation API.
The result of static-base-uri()
varies. When running on .NET from the API, the static base URI is file://server/unctest/test.xsl
; in the other three environments it is file:////server/unctest/test.xsl
.
This is telling us firstly, that .NET file-to-URI conversion is generating the 2-slash form (with an authority component), while Java file-to-URI conversion is generating the 4-slash form; and secondly, that Java is capable of doing URI resolution and URI dereferencing successfully with either form -- though Saxon is giving it some help, because we have special logic in ResolveURI.makeAbsolute()
to handle 4-slash URIs (in fact, (a) we avoid calling URI.normalize()
, and (b) we use the code in java.net.URL.resolve()
in preference to java.net.URI.resolve(
)`).
We don't get a failure until we attempt to do file:base-dir()
. At this point we get different failures on different platforms.
With the 4-slash URI format, file:base-dir()
gives us C:\server\unctest\test.xsl
which is clearly useless. We can solve this within the code of file:base-dir() by changing a call of new File(uri.normalize())
to new File(uri.getPath())
-- it seems to be the call on normalize()
that is doing the damage.
With the 2-slash URI format, we are getting the observed error "URI has an authority component". It is crashing on the call to normalize()
, and the same fix seems to work here.
The significance of file:base-dir()
is that it is one of the few places where we convert URIs back to filenames; and it's this conversion that seems to cause the trouble. There are a number of other places in the product where we have similar logic and we need to try and find them. Some of them already seem to have been addressed in the past: for example ResolveURI
avoids calling uri.normalize()
in the case of a "file:////" (4-slash) URI.
Updated by Michael Kay over 5 years ago
The incorrect behaviour of URI.normalize()
is addressed in https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4723726 (closed as "won't fix"). This pushes the blame onto (a) the IETF RFCs, where the process of normalization is defined, and (b) the fact that File.toURI() creates a 4-slash URI rather than a 2-slash URI; it suggests replacing File.toURI()
with File.toPath().toURI()
if you want a 2-slash URI "as developers expect".
Note that file:path-to-uri()
uses new File(path).toPath().toURI()
to work around this JDK issue.
Updated by Michael Kay over 5 years ago
- Assignee set to Michael Kay
- Priority changed from High to Normal
- Applies to branch 9.9, trunk added
- Fix Committed on Branch 9.9, trunk added
Fixed this in the implementation of file:base-dir()
.
Updated by Michael Kay over 5 years ago
- Status changed from In Progress to Resolved
Updated by Michael Kay over 5 years ago
- Due date set to 2018-04-16
- Start date changed from 2019-08-13 to 2018-04-16
- Follows Bug #3745: file:path-to-uri() fails or works incorrectly with relative path and UNC added
Updated by Kevon Hayes over 5 years ago
Thanks for the pursuit Michael,
Will these fixes be in the form of a hotfix or next major revision?
Updated by Michael Kay over 5 years ago
We issue maintenance releases with bug fixes every 6-8 weeks depending on urgency. We're working towards the bext maintenance release in a week or two, hoping to get some of the bug backlog from the holiday season out of the way before we ship it.
Updated by O'Neil Delpratt over 5 years ago
- Status changed from Resolved to Closed
- % Done changed from 0 to 100
- Fixed in Maintenance Release 9.9.1.5 added
Bug fix applied in the Saxon 9.9.1.5 maintenance release.
Updated by Kevon Hayes over 5 years ago
Great! In hopeful expectation of maintenance release 9.9.1.5.
Please register to edit this issue