Saxonica Developer Community: Issueshttps://saxonica.plan.io/https://saxonica.plan.io/favicon.ico2024-03-27T11:18:36ZSaxonica Developer Community
Planio Saxon - Bug #6379 (New): Default implementation of fn:deep-equalhttps://saxonica.plan.io/issues/63792024-03-27T11:18:36ZNorm Tovey-Walsh
<p>I happened to trace my way through a call to deep equal in Saxon HE 12.4 and I was a little bit surprised to find that it's using the <code>DeepEqual40</code> implementation. I wonder if that was intentional...</p> SaxonJS - Feature #6375 (New): Support popular DOM implementationshttps://saxonica.plan.io/issues/63752024-03-19T14:58:59ZMichael Kaymike@saxonica.com
<p>The "barrier to entry" for SaxonJS would be reduced if we were able to promote it as an API for executing XPath 3.1 against popular DOM implementations -- xmldom, jsdom, slimdom, ...</p>
<p>I don't think there's any fundamental barrier to supporting external DOM implementations, it just needs testing.</p> Saxon - Support #6369 (New): Serialization problem of XQuery result using Saxon 12.3https://saxonica.plan.io/issues/63692024-03-06T10:36:34ZRadu Coravuradu_coravu@sync.ro
<p>We run this XQuery as a transformation scenario in Oxygen:</p>
<pre><code>declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method 'text';
declare option output:item-separator ', ';
let $xml:=<xml><element>text1</element><element>text2</element></xml> return $xml/element/string()
</code></pre>
<p>and we get as result:</p>
<pre><code>text1, , text2
</code></pre>
<p>Notice the two commas ", " between the two items.</p>
<p>To serialize we create a tree receiver something like:</p>
<pre><code> SerializerFactory sf = this.queryTransformer.getConfiguration().getSerializerFactory();
PipelineConfiguration pipe = this.queryTransformer.getConfiguration().makePipelineConfiguration();
SerializationProperties props = new SerializationProperties(queryTransformer.getOutputProperties());
Receiver receiver = sf.getReceiver(new StreamResult(sw), props, pipe);
tr = new TreeReceiver(receiver)..
</code></pre>
<p>The "net.sf.saxon.event.SequenceReceiver#decompose" is called for each item "text1" and "text2". The stack trace is something like:</p>
<pre><code> at net.sf.saxon.str.UnicodeWriterToWriter.write(UnicodeWriterToWriter.java:36)
at net.sf.saxon.serialize.TEXTEmitter.characters(TEXTEmitter.java:104)
at net.sf.saxon.event.ProxyReceiver.characters(ProxyReceiver.java:158)
at net.sf.saxon.event.SequenceNormalizer.characters(SequenceNormalizer.java:99)
at net.sf.saxon.event.SequenceNormalizerWithItemSeparator.sep(SequenceNormalizerWithItemSeparator.java:135)
at net.sf.saxon.event.SequenceNormalizerWithItemSeparator.characters(SequenceNormalizerWithItemSeparator.java:75)
at net.sf.saxon.event.TreeReceiver.characters(TreeReceiver.java:176)
at net.sf.saxon.event.SequenceReceiver.decompose(SequenceReceiver.java:178)
</code></pre>
<p>For "text2" which is ATOMIC the code in "net.sf.saxon.event.SequenceReceiver.decompose(Item, Location, int)" does this:</p>
<pre><code> protected void decompose(Item item, Location locationId, int copyNamespaces) throws XPathException {
if (item != null) {
switch (item.getGenre()) {
case ATOMIC:
case EXTERNAL:
if (previousAtomic) {
characters(StringConstants.SINGLE_SPACE, locationId, ReceiverOption.NONE);
}
characters(item.getUnicodeStringValue(), locationId, ReceiverOption.NONE);
</code></pre>
<p>It calls "characters(StringConstants.SINGLE_SPACE, locationId, ReceiverOption.NONE);" which adds a space and a comma before the space as the method "net.sf.saxon.event.SequenceNormalizerWithItemSeparator.characters(UnicodeString, Location, int)" always calls sep().
And then it calls:</p>
<pre><code>characters(item.getUnicodeStringValue(), locationId, ReceiverOption.NONE);
</code></pre>
<p>which again adds a comma before the value. So we get two commas before the actual value is printed.</p> SaxonC - Bug #6353 (New): Unicode characters in filenames are causing errors in Windowshttps://saxonica.plan.io/issues/63532024-02-20T09:38:55ZMatt Patterson
<p>As reported by a user in this SO post comment: <a href="https://stackoverflow.com/questions/77962974/saxon-xslt-processing-thousands-of-xml-files-in-a-complex-tree-structure/77963410?noredirect=1#comment137525632_77963410" class="external">https://stackoverflow.com/questions/77962974/saxon-xslt-processing-thousands-of-xml-files-in-a-complex-tree-structure/77963410?noredirect=1#comment137525632_77963410</a></p>
<p>Thanks to silfer1200 and Martin Honnen for reporting.</p>
<p>If you try to pass a filename containing a unicode char with a multi-byte representation in UTF-8 into Saxon C's python layer some weird mangling happens and it looks like the string gets decomposed to a bytestream and then recomposed into a string with each byte considered a complete character.</p>
<p>Given a very simple test setup with the following XML file and python script, this will error out every time it's run on Windows. On macOS it's fine.</p>
<p><code>test.py</code>:</p>
<pre><code class="python syntaxhl" data-language="python"><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">from</span> <span class="nn">saxonche</span> <span class="kn">import</span> <span class="n">PySaxonProcessor</span>
<span class="n">dir_path</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">dirname</span><span class="p">(</span><span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">realpath</span><span class="p">(</span><span class="n">__file__</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="n">sys</span><span class="p">.</span><span class="n">getdefaultencoding</span><span class="p">())</span>
<span class="k">print</span><span class="p">(</span><span class="n">sys</span><span class="p">.</span><span class="n">getfilesystemencoding</span><span class="p">())</span>
<span class="k">with</span> <span class="n">PySaxonProcessor</span><span class="p">()</span> <span class="k">as</span> <span class="n">saxon_proc</span><span class="p">:</span>
<span class="n">xml</span> <span class="o">=</span> <span class="n">saxon_proc</span><span class="p">.</span><span class="n">parse_xml</span><span class="p">(</span><span class="n">xml_file_name</span><span class="o">=</span><span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">dir_path</span><span class="p">,</span> <span class="s">'köln.xml'</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="n">xml</span><span class="p">)</span>
</code></pre>
<p><code>köln.xml</code>:</p>
<pre><code class="xml syntaxhl" data-language="xml"><span class="cp"><?xml version="1.0" encoding="utf-8"?></span>
<span class="nt"><hello></span>Köln<span class="nt"></hello></span>
</code></pre>
<p>Windows:</p>
<pre><code>(test-venv) C:\Saxonica\unicode>python test.py
utf-8
utf-8
Traceback (most recent call last):
File "C:\Saxonica\unicode\test.py", line 11, in <module>
xml = saxon_proc.parse_xml(xml_file_name=os.path.join(dir_path, 'köln.xml'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "python_saxon\saxonc.pyx", line 868, in saxonche.PySaxonProcessor.parse_xml
saxonche.PySaxonApiError: Unable to resolve <C:\Saxonica\unicode\köln.xml> into a Source. Line number: -1
</code></pre>
<p>macOS:</p>
<pre><code>$ python test.py
utf-8
utf-8
<hello>Köln</hello>
<hello>Köln</hello>
</code></pre> Saxon - Bug #6343 (New): Function Coercion is not always appliedhttps://saxonica.plan.io/issues/63432024-02-11T23:27:14ZMichael Kaymike@saxonica.com
<p>I have created the following test case FunctionCall-058:</p>
<pre><code> declare function local:f($callback as function(xs:integer) as xs:boolean) as xs:boolean {
$callback(year-from-date(current-date()) div 1900)
};
local:f(function($d as xs:decimal) as xs:boolean { $d lt 0 })
</code></pre>
<p>This should fail because the <code>$callback</code> function requires an xs:integer but the supplied value (the result of the integer division) is an xs:decimal.</p>
<p>What is supposed to happen according to the spec is that the supplied function (which accepts an x:decimal) is coerced to the required type (which does not). The means that the function actually supplied to the $callback parameter is effectively:</p>
<pre><code>function($d1 as xs:integer) as xs:boolean {
function($d2 as xs:decimal) as xs:boolean { $d2 lt 0 } ($d1)
}
</code></pre>
<p>which should fail with a type error when called supplying an <code>xs:decimal</code>.</p>
<p>However, because the value supplied to the <code>$callback</code> parameter is an instance of the required type, Saxon skips the process of function coercion wrongly believing it to be unnecessary; his has the effect that the type error is not detected.</p> SaxonJS - Feature #6330 (New): XSD Schema Validation in SaxonJS https://saxonica.plan.io/issues/63302024-01-26T13:30:29ZEric Van Boxsom
<p>it would be amazing to have a XSD Schema Validation feature in SaxonJS.</p>
<p>Do you think it is something you could be working on anytime soon ?</p>
<p>Thank you</p> SaxonC - Feature #6328 (New): Expose URI helpers to C/C++https://saxonica.plan.io/issues/63282024-01-26T08:19:34ZMatt Patterson
<p>Unlike Python, which has URL manipulation support in the standard library, C & C++ don’t make URL manipulation as convenient. While running Saxon C samples on Windows I came across a couple of places where filepath-to-URI conversion broke because handling windows separators is complex… we should at least provide a helper function for parsing filesystem paths into file URIs, given how important they are in daily use…</p> SaxonC - Bug #6317 (In Progress): The cwd is limited to 256 charactershttps://saxonica.plan.io/issues/63172024-01-12T13:05:40ZNorm Tovey-Walsh
<ol>
<li>What lengths should we use for the various platforms?</li>
<li>Where is this documented?</li>
</ol> SaxonC - Feature #6316 (New): Make PyXslt30Processor and pysaxonProcessor serializable/pickleable https://saxonica.plan.io/issues/63162024-01-11T15:19:57ZYoussef Bettayeb
<p>Hello, I am working with Saxonche in a databricks/pyspark environment and when it comes to applying transformations to XMLs using saxonche, it is not making use of the parralel processing capabilities because the PyXslt30Processor are not "serializable" and thus it's impossible to use UDF etc to mass transform XMLs (i have multiple XMLs and one single XSLT to apply to all of them) The error I am getting when doing so is : Python process TypeError: no default <strong>reduce</strong> due to non-trivial <strong>cinit</strong></p>
<p>would making those objects serializable possible ?</p>
<p>Thanks a lot for your time</p> Saxon - Bug #6307 (New): Saxon (HE; Java) trace missing xsl:accumulator and xsl:accumulator-rule,...https://saxonica.plan.io/issues/63072023-12-27T21:52:00ZA Galtman
<ul>
<li>Is the trace supposed to mark xsl:accumulator as hit?</li>
<li>Is the trace supposed to mark xsl:accumulator-rule as hit?</li>
<li>I definitely expect descendant elements of xsl:accumulator-rule to be marked as hit, and I'm not seeing them in Saxon 12.4. Saxon 9.9.1.8 correctly lists the descendants of xsl:accumulator-rule.</li>
</ul>
<p>I'm attaching a file that includes the code, the Saxon arguments I used, and the traces I got from Saxon 9.9.1.8 and 12.4.</p> Saxon - Bug #6305 (New): XPathException "The stylesheet module includes/imports itself directly o...https://saxonica.plan.io/issues/63052023-12-22T15:02:48ZGerben Abbinkgerben.abbink@gmail.com
<p>I use an ErrorReporter with XsltCompiler, like this:</p>
<pre><code>XsltCompiler compiler = processor.newXsltCompiler();
compiler.setErrorReporter(...);
</code></pre>
<p>Usually, XPathExceptions have a Location to identify the error in the file.</p>
<p>But, when I use this template, there is no location information:</p>
<pre><code><?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:import href=""/>
</xsl:stylesheet>
</code></pre> Saxon - Bug #6302 (New): Saxon (HE; Java) trace not well-formed when XSLT uses transform functionhttps://saxonica.plan.io/issues/63022023-12-21T19:56:45ZA Galtman
<p>(I alluded to this in my 12/21/23 comment in <a href="https://saxonica.plan.io/issues/6295" class="external">https://saxonica.plan.io/issues/6295</a> but it seems different enough that I'm making a separate issue for it rather than assume that the fix for 6295 is enough.)</p>
<p>The <code>transform()</code> function seems to cause the Saxon trace not to be well-formed.</p>
<a name="Sample-XSLT-1-transform-functionxsl"></a>
<h3 >Sample XSLT 1, transform-function.xsl<a href="#Sample-XSLT-1-transform-functionxsl" class="wiki-anchor">¶</a></h3>
<pre><code><?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:my="my-ns"
exclude-result-prefixes="#all"
version="3.0">
<xsl:template name="xsl:initial-template">
<xsl:variable name="transform-options" as="map(xs:string, item()*)">
<xsl:map>
<xsl:map-entry key="'delivery-format'" select="'raw'"/>
<xsl:map-entry key="'stylesheet-location'">target-stylesheet-small.xsl</xsl:map-entry>
<xsl:map-entry key="'function-params'" select="[()]"/>
<xsl:map-entry key="'initial-function'" select="QName('my-ns', 'my:fcn')"/>
</xsl:map>
</xsl:variable>
<xsl:sequence select="transform($transform-options)?output"/>
</xsl:template>
</xsl:stylesheet>
</code></pre>
<a name="Sample-XSLT-2-target-stylesheet-smallxsl"></a>
<h3 >Sample XSLT 2, target-stylesheet-small.xsl<a href="#Sample-XSLT-2-target-stylesheet-smallxsl" class="wiki-anchor">¶</a></h3>
<pre><code><?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
xmlns:my="my-ns"
exclude-result-prefixes="#all"
version="3.0">
<xsl:function name="my:fcn" visibility="public" as="xs:string">
<xsl:param name="p" as="empty-sequence()"/>
<xsl:sequence select="'output'"/>
</xsl:function>
</xsl:stylesheet>
</code></pre>
<a name="Saxon-command"></a>
<h3 >Saxon command<a href="#Saxon-command" class="wiki-anchor">¶</a></h3>
<pre><code>java -cp "%SAXON_CP%" net.sf.saxon.Transform -opt:0 -T -Tlevel:high -it -xsl:transform-function.xsl 2>transform-function-traceresult.xml
</code></pre>
<p>I'm using Saxon-HE 12.4.</p>
<a name="Trace-Result"></a>
<h3 >Trace Result<a href="#Trace-Result" class="wiki-anchor">¶</a></h3>
<pre><code><trace saxon-version="12.4" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template name="xsl:initial-template" line="9" column="45" module="transform-function.xsl">
<xsl:variable name="transform-options" line="10" column="73" module="transform-function.xsl">
<trace text="target-stylesheet-small.xsl" line="13" column="52" module="transform-function.xsl">
</trace>
<trace saxon-version="12.4" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:function arity="1" name="my:fcn" line="9" column="66" module="target-stylesheet-small.xsl">
</xsl:function>
</xsl:variable>
</xsl:template>
</trace>
</code></pre> SaxonC - Bug #6299 (New): Add a Makefile to the release for building of the SaxonC C/C++ API libraryhttps://saxonica.plan.io/issues/62992023-12-20T13:02:23ZO'Neil Delprattoneil@saxonica.com
<p>Reported by user here <a href="https://stackoverflow.com/questions/77684816/undefined-reference-error-when-link-to-saxon-library-in-c/77690492#77690492" class="external">https://stackoverflow.com/questions/77684816/undefined-reference-error-when-link-to-saxon-library-in-c/77690492#77690492</a></p>
<p>The SaxonC release should contain a MakeFile to build the C/C++ library. This is something we attempted to do long time ago, but for some reason we proceed with its use. Since we are now using Graalvm we should use a Makefile.</p> SaxonC - Bug #6297 (New): 12.4.1 command build errors on CentOS 7https://saxonica.plan.io/issues/62972023-12-19T11:50:46ZTony Graham
<p>Running <code>build-saxonc-commands.sh</code> on CentOS 7 generated errors about the C version, such as:</p>
<pre><code>Transform.c: In function ‘transform’:
Transform.c:33:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 1; i < argc; i++) {
^
Transform.c:33:3: note: use option -std=c99 or -std=gnu99 to compile your code
</code></pre>
<p>After adding <code>-std=c99</code> to the GCC command lines, there were errors about <code>int64_t</code>, such as:</p>
<pre><code>In file included from ../Saxon.C.API/SaxonCGlue.c:1:0:
../Saxon.C.API/SaxonCGlue.h:92:3: error: unknown type name ‘int64_t’
int64_t value; /*!< Internal use only. The value of the parameter points
^
</code></pre>
<p>That was worked around by adding <code>#include <stdint.h></code> for <code>__LINUX__</code> and <code>__APPLE__</code> in <code>SaxonCGlue.h</code>. I don't know whether <code>#include <stdint.h></code> would be necessary for <code>__APPLE__</code>.</p>
<p>Also, it's not clear why <code>#include <stdio.h></code> is included a second time for <code>__LINUX__</code> and <code>__APPLE__</code>.</p>
<pre><code class="diff syntaxhl" data-language="diff"><span class="gd">--- build-saxonc-commands.sh~ 2023-12-01 12:17:46.000000000 +0000
</span><span class="gi">+++ build-saxonc-commands.sh 2023-12-19 08:52:51.329869064 +0000
</span><span class="p">@@ -4,15 +4,15 @@</span>
parent_path=$( cd "$(dirname "$0")" ; pwd -P )
library_dir=${1-$(pwd -P)}/../libs/nix
#cd "$parent_path"
<span class="gd">-gcc -m64 -fPIC -I../Saxon.C.API/graalvm -c ../Saxon.C.API/SaxonCGlue.c -o SaxonCGlue.o $1
</span><span class="gi">+gcc -std=c99 -m64 -fPIC -I../Saxon.C.API/graalvm -c ../Saxon.C.API/SaxonCGlue.c -o SaxonCGlue.o $1
</span>
<span class="gd">-gcc -m64 -fPIC -I../Saxon.C.API/graalvm Transform.c -o transform -ldl -lc SaxonCGlue.o -L$library_dir -lsaxon-hec-12.4.1 $1
</span><span class="gi">+gcc -std=c99 -m64 -fPIC -I../Saxon.C.API/graalvm Transform.c -o transform -ldl -lc SaxonCGlue.o -L$library_dir -lsaxon-hec-12.4.1 $1
</span>
<span class="gd">-gcc -m64 -fPIC -I../Saxon.C.API/graalvm Query.c -o query -ldl -lc SaxonCGlue.o -L$library_dir -lsaxon-hec-12.4.1 $1
</span><span class="gi">+gcc -std=c99 -m64 -fPIC -I../Saxon.C.API/graalvm Query.c -o query -ldl -lc SaxonCGlue.o -L$library_dir -lsaxon-hec-12.4.1 $1
</span>
<span class="gd">-gcc -m64 -fPIC -I../Saxon.C.API/graalvm Gizmo.c -o gizmo -ldl -lc SaxonCGlue.o -L$library_dir -lsaxon-hec-12.4.1 $1
</span><span class="gi">+gcc -std=c99 -m64 -fPIC -I../Saxon.C.API/graalvm Gizmo.c -o gizmo -ldl -lc SaxonCGlue.o -L$library_dir -lsaxon-hec-12.4.1 $1
</span>
if [ -f Validate.c ]; then
<span class="gd">- gcc -m64 -fPIC -I../Saxon.C.API/graalvm Validate.c -o validate -ldl -lc SaxonCGlue.o -L$library_dir -lsaxon-hec-12.4.1 $1
</span><span class="gi">+ gcc -std=c99 -m64 -fPIC -I../Saxon.C.API/graalvm Validate.c -o validate -ldl -lc SaxonCGlue.o -L$library_dir -lsaxon-hec-12.4.1 $1
</span> fi
</code></pre>
<pre><code class="diff syntaxhl" data-language="diff"><span class="gd">--- SaxonCGlue.h~ 2023-12-01 12:17:48.000000000 +0000
</span><span class="gi">+++ SaxonCGlue.h 2023-12-19 09:00:42.996931099 +0000
</span><span class="p">@@ -17,6 +17,7 @@</span>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
<span class="gi">+#include <stdint.h>
</span> #else
#include <stdint.h>
#include <windows.h>
</code></pre> SaxonC - Bug #6291 (New): Warnings for invalid keywords from transform_to_string methods on PyXsl...https://saxonica.plan.io/issues/62912023-12-15T15:06:24ZDebbie Lockettdebbie@saxonica.com
<p>The documentation for the <code>PyXslt30Processor</code> <code>transform_to_string</code> method says that the keywords <code>source_file</code> and <code>stylesheet_file</code> are required. The method currently raises a warning only when <code>len(kwds) == 0</code>. It would be better to raise a warning precisely when <code>source_file</code> and <code>stylesheet_file</code> are not both supplied.</p>
<p>Meanwhile for <code>transform_to_string</code> on <code>PyXsltExecutable</code>, there is some other confusion about supplied keywords. Here only one of <code>source_file</code> or <code>xdm_node</code> can be supplied, and a warning is raised if both are (I committed a change here yesterday to check for precisely both of these keywords, rather than <code>len(kwds) > 0</code>). (Note that it is OK to supply neither.) The other possible keywords are <code>base_output_uri</code> and <code>encoding</code>. It looks like you'll currently get a (unhelpfully worded) warning if you try to supply only <code>encoding</code> - but that should be permitted.</p>