Project

Profile

Help

Bug #4523 » bug_demo.html

Debbie Lockett, 2020-04-20 13:16

 
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>bug demo</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8;"/>
<script type="application/javascript" src="SaxonJS.min.js">
<!-- HTML parsers are so easily confused -->
</script>
<script type="application/javascript">
window.onload = function() {
SaxonJS.transform({
sourceLocation: "bug_demo.xml",
stylesheetLocation: "bug_demo.sef"
});
}
</script>
</head>
<body style="padding: 2em;">
<h1>demo of what might be a bug in Saxon-JS 1.2.0</h1>
<h2>2020-04-18 by Syd Bauman</h2>
<p>The associated XSLT program (<tt>./bug_demo.xslt</tt>) is
supposed to return a single number (cast to a string for output).
It counts how many “words” there are in the input document, where
a word is defined as a whitespace-separated token in the content
of a particular XPath, that is not a descendant of one of a
handful of ignorable elements.</p>
<p>If it is being run in the browser, it is supposed to replace
the contents of the paragraph below with the number.</p>
<p id="bug_demo" style="padding: 0em 0em 0em 2em; font-size: x-large;">
<i>[The number goes here]</i>
</p>
<p>If it is being run on the commandline it writes the output to a
file in the /tmp/ directory. (I used &lt;result-document> even
from the commandline just to make them parallel. Probably would
get same results if just written to STDOUT or wherever -o: switch
says.) It always seems to count correctly when run on the
commandline, whether using Saxon 9.9 or 10.0.</p>
<p>In the browser, where I am using Saxon-EE 9.9.1.5 to generate a
SEF file from the program, is a different story. When I started
writing this it did not seem to matter how big the input document
(<tt>../bug_demo.xml</tt>) was, I got the same <tt
style="color:red;">XError: looping???</tt> error from Saxon in the
browser when it was over 11,000 words long or 11 words long. But
later the looping error started to appear only with large input
documents. (At ≤ 1002 counted words it works fine, at ≥ 1003 it
fails.) The only difference I can think of between “sooner” and
“later” is that I re-launched Firefox, and thus updates I received
over the last N days via Ubuntu Software Updater may have been
applied, idunno. (I also probably made some code tweaks that I
thought would not effect anything, but I don’t even remember if I
did or not, let alone what they might have been.)</p>
<p>I don’t speak Java, but I am very suspicious of this snippet of
code I found on <a
href="https://gist.github.com/mhogerheijde/7051105890dad5b334d95ad671d2c7b8">GitHub</a>:</p>
<pre> if (loops++ > 1000) {
throw XError("looping???");
}</pre>
<p>It seems at first blush as if the error is flagged the first
time the <tt>$all_content</tt> variable is referenced, whether
or not it is processed step-by-step:</p>
<pre> &lt;xsl:variable name="ac_normalized" select="normalize-space($all_content)"/>
&lt;xsl:variable name="all_tokens" select="tokenize( $ac_normalized,'&amp;#x20;')"/>
&lt;xsl:variable name="num_tokens" select="count( $all_tokens )"/></pre>
<p>or all-at-once: <tt>&lt;xsl:variable name="num_words" select="normalize-space( $all_content ) ! tokenize('&amp;#x20;') => count()"/></tt>.</p>
<h2>Other issues</h2>
<p>It also does not seem to matter whether or not I use a
predicate. The predicate, <tt>[ matches( .,'\p{L}')]</tt>, was in
my original code just to get rid of tokens like “—” or “!!” that
should not be counted as words for my application. (It is a poor
approximation of what I really want, because “Sept. 11” should
count as two words, not one.) However, using the predicate seems
to make it more likely that I get <tt>Synchronous XMLHttpRequest
on the main thread is deprecated because of its detrimental
effects to the end user’s experience. For more help
http://xhr.spec.whatwg.org/</tt>. I have no idea what that warning
means or what, if anything, to do about it. And even less after
going to the recommended web page.</p>
<p>I am also wondering if there is a <em>right</em> way to ask
“am I being run as a SEF in a browser?”. I found that
<tt>function-available('ixsl:location')</tt> does the trick,
but it seems a bit clumsy.</p>
<p>I also wonder if there is any way to use &lt;result-document>
to write to a disk file when running as a SEF in the browser. The
<a
href="http://www.saxonica.com/saxon-js/documentation/index.html#!development/result-documents">documentation</a>
implies the answer is “no”, but I am hopeful, as it could make
debugging a lot easier.</p>
</body>
</html>
(1-1/3)