Type error evaluating (fn:collection(...))
I get a type error if I make a transformation using the following stylesheet ans Saxon 188.8.131.52. It works with Saxon 184.108.40.206 and 220.127.116.11. I think is related with: https://saxonica.plan.io/issues/2749
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match="/"> <xsl:variable name="url" select="'file:///D:/projects/eXml/samples/docbook/v5/out'"/> <xsl:variable name="FILELIST" select="collection(concat($url, '?recurse=yes;select=*.indexterms'))"/> <xsl:variable name="terms" select="for $n in $FILELIST/*/* return $n"/> <xsl:value-of select="$terms"/> </xsl:template> </xsl:stylesheet>
Type error evaluating (fn:collection(...)) in xsl:variable/@select on line 6 column 110 of Untitled.xsl: XPTY0019: The required item type of the first operand of '/' is node(); the supplied value xs:base64Binary("PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz48aW5kZXggeG1sbnM9Imh0dHA6Ly93d3cub3h5Z2VueG1sLmNvbS9ucy93ZWJoZWxwL2luZGV4Ii8+") is an atomic value In template rule with match="/" on line 4 of Untitled.xsl The required item type of the first operand of '/' is node(); the supplied value xs:base64Binary("PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz48aW5kZXggeG1sbnM9Imh0dHA6Ly93d3cub3h5Z2VueG1sLmNvbS9ucy93ZWJoZWxwL2luZGV4Ii8+") is an atomic value
#1 Updated by Michael Kay 10 months ago
I've reproduced this using
repo/samples/scm?select=*.scm as the collection URI - basically a directory that contains XML files, expects them to be treated as XML, but doesn't use a known file extension or HTTP media type that identifies them as XML. I think they were previously being recognized as XML by virtue of sniffing the initial bytes of the file. This changed with bug #4382; we were leaving the stream connection open after doing the sniffing, which led to exhaustion of the limit on open streams, so the design approach had to change.
The failure is because we haven't recognized this as an XML resource; we're delivering it as an unparsed Base64Binary object, which obviously can't appear on the lhs of the "/" operator.
The problem with the new (post-#4382) approach is we either have to open the file twice (once to do the sniffing, once to actually read the content), or we have to be prepared to defer recognising the file type until the file is actually opened.
We're treating the file as binary because that's the default for an unrecognized file extension. One possibility is to change the default to XML; since collection() historically only returned XML files, that's the option that most people are likely to be using, at least with directory-based collections which are perhaps the most common kind.
#2 Updated by Michael Kay 10 months ago
- Category set to Internals
- Status changed from New to In Progress
- Assignee set to Michael Kay
- Priority changed from Low to Normal
- Applies to branch trunk added
I have implemented (and am testing) the following solution:
(a) the default media type registered in the configuration is changed to "application/unknown". This can be changed using a call such as
(b) the default media type is used the URI scheme is "file" and a media type cannot be inferred from the file extension.
(c) if the media type inferred from examination of the URI is "application/unknown", we allocate a new kind of Resource called
getItem() method on
UnknownResource sniffs the content (using
URLConnection.guessContentTypeFromStream()) and then delegates to a more specific resource type obtained by calling
#3 Updated by Michael Kay 10 months ago
In testing this, I have one unit test failing
(testCollectionWithHttp, which is using a collection catalog accessed over HTTP). This does not appear to be a new failure, the same test is failing under 9.9 where the changes have not yet been applied.
The failure occurs because
inferStreamEncoding fails with an IOException "mark/reset not supported" while doing obtainCharacterContent(). Since we're reading JSON here, we could really assume an encoding of UTF-8.
A further complication is that the error isn't cleanly reported, because of the multi-threaded execution.
Setting the JSONResource encoding to UTF-8, rather than attempting to infer it, solves the particular test case - though it leaves the more general issue that with HTTP resources, inferring the encoding when not given in the HTTP headers isn't working.
#6 Updated by Michael Kay 10 months ago
Yes, I think that before the fix for bug #4382 we were sniffing the file to detect content type if the file extension was unknown, but after the fix that stopped working, at least for file:// URIs. It's now reinstated. But you might like to consider registering additional file extensions (such as .dita) with the configuration.
#12 Updated by Radu Coravu 10 months ago
One mention about the usage of this utility:
it works only if the input stream supports marking (java.io.InputStream.markSupported()). Maybe to be 100% sure the stream has mark support, it could have been wrapped in a buffered input stream: stream = new BufferedInputStream(stream); just to make sure this works no matter what stream implementation is used.
#14 Updated by Michael Kay 10 months ago
For 10.0 only, I have added a query parameter content-type to the collection URI format recognised by directory and JAR collections; if present, this takes precedence over (and inhibits) any guessing of content type from the file name or file content.
This needs documenting at http://www.saxonica.com/documentation/index.html#!sourcedocs/collections
Also: there is a query parameter that appears to be implemented but undocumented, and has no effect: unparsed=yes|no. I'm going to get rid of it from the code. I think the idea was that you could retrieve unparsed XML if you wanted, but the implementation wasn't completed; you can now achieve this effect using
Please register to edit this issue