|
<?xml version="1.0" encoding="UTF-8"?>
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
<head>
|
|
<title>bug demo</title>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8;"/>
|
|
<script type="application/javascript" src="SaxonJS.min.js">
|
|
<!-- HTML parsers are so easily confused -->
|
|
</script>
|
|
<script type="application/javascript">
|
|
window.onload = function() {
|
|
SaxonJS.transform({
|
|
sourceLocation: "bug_demo.xml",
|
|
stylesheetLocation: "bug_demo.sef"
|
|
});
|
|
}
|
|
</script>
|
|
</head>
|
|
<body style="padding: 2em;">
|
|
<h1>demo of what might be a bug in Saxon-JS 1.2.0</h1>
|
|
<h2>2020-04-18 by Syd Bauman</h2>
|
|
<p>The associated XSLT program (<tt>./bug_demo.xslt</tt>) is
|
|
supposed to return a single number (cast to a string for output).
|
|
It counts how many “words” there are in the input document, where
|
|
a word is defined as a whitespace-separated token in the content
|
|
of a particular XPath, that is not a descendant of one of a
|
|
handful of ignorable elements.</p>
|
|
<p>If it is being run in the browser, it is supposed to replace
|
|
the contents of the paragraph below with the number.</p>
|
|
<p id="bug_demo" style="padding: 0em 0em 0em 2em; font-size: x-large;">
|
|
<i>[The number goes here]</i>
|
|
</p>
|
|
<p>If it is being run on the commandline it writes the output to a
|
|
file in the /tmp/ directory. (I used <result-document> even
|
|
from the commandline just to make them parallel. Probably would
|
|
get same results if just written to STDOUT or wherever -o: switch
|
|
says.) It always seems to count correctly when run on the
|
|
commandline, whether using Saxon 9.9 or 10.0.</p>
|
|
<p>In the browser, where I am using Saxon-EE 9.9.1.5 to generate a
|
|
SEF file from the program, is a different story. When I started
|
|
writing this it did not seem to matter how big the input document
|
|
(<tt>../bug_demo.xml</tt>) was, I got the same <tt
|
|
style="color:red;">XError: looping???</tt> error from Saxon in the
|
|
browser when it was over 11,000 words long or 11 words long. But
|
|
later the looping error started to appear only with large input
|
|
documents. (At ≤ 1002 counted words it works fine, at ≥ 1003 it
|
|
fails.) The only difference I can think of between “sooner” and
|
|
“later” is that I re-launched Firefox, and thus updates I received
|
|
over the last N days via Ubuntu Software Updater may have been
|
|
applied, idunno. (I also probably made some code tweaks that I
|
|
thought would not effect anything, but I don’t even remember if I
|
|
did or not, let alone what they might have been.)</p>
|
|
<p>I don’t speak Java, but I am very suspicious of this snippet of
|
|
code I found on <a
|
|
href="https://gist.github.com/mhogerheijde/7051105890dad5b334d95ad671d2c7b8">GitHub</a>:</p>
|
|
<pre> if (loops++ > 1000) {
|
|
throw XError("looping???");
|
|
}</pre>
|
|
<p>It seems at first blush as if the error is flagged the first
|
|
time the <tt>$all_content</tt> variable is referenced, whether
|
|
or not it is processed step-by-step:</p>
|
|
<pre> <xsl:variable name="ac_normalized" select="normalize-space($all_content)"/>
|
|
<xsl:variable name="all_tokens" select="tokenize( $ac_normalized,'&#x20;')"/>
|
|
<xsl:variable name="num_tokens" select="count( $all_tokens )"/></pre>
|
|
<p>or all-at-once: <tt><xsl:variable name="num_words" select="normalize-space( $all_content ) ! tokenize('&#x20;') => count()"/></tt>.</p>
|
|
<h2>Other issues</h2>
|
|
<p>It also does not seem to matter whether or not I use a
|
|
predicate. The predicate, <tt>[ matches( .,'\p{L}')]</tt>, was in
|
|
my original code just to get rid of tokens like “—” or “!!” that
|
|
should not be counted as words for my application. (It is a poor
|
|
approximation of what I really want, because “Sept. 11” should
|
|
count as two words, not one.) However, using the predicate seems
|
|
to make it more likely that I get <tt>Synchronous XMLHttpRequest
|
|
on the main thread is deprecated because of its detrimental
|
|
effects to the end user’s experience. For more help
|
|
http://xhr.spec.whatwg.org/</tt>. I have no idea what that warning
|
|
means or what, if anything, to do about it. And even less after
|
|
going to the recommended web page.</p>
|
|
<p>I am also wondering if there is a <em>right</em> way to ask
|
|
“am I being run as a SEF in a browser?”. I found that
|
|
<tt>function-available('ixsl:location')</tt> does the trick,
|
|
but it seems a bit clumsy.</p>
|
|
<p>I also wonder if there is any way to use <result-document>
|
|
to write to a disk file when running as a SEF in the browser. The
|
|
<a
|
|
href="http://www.saxonica.com/saxon-js/documentation/index.html#!development/result-documents">documentation</a>
|
|
implies the answer is “no”, but I am hopeful, as it could make
|
|
debugging a lot easier.</p>
|
|
</body>
|
|
</html>
|