Project

Profile

Help

Bug #4523 » bug_demo.html

Debbie Lockett, 2020-04-20 13:16

 
1
<?xml version="1.0" encoding="UTF-8"?>
2
<html xmlns="http://www.w3.org/1999/xhtml">
3
  <head>
4
    <title>bug demo</title>
5
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8;"/>
6
    <script type="application/javascript" src="SaxonJS.min.js">
7
      <!-- HTML parsers are so easily confused -->
8
    </script>
9
    <script type="application/javascript">
10
          window.onload = function() {
11
          SaxonJS.transform({
12
            sourceLocation:     "bug_demo.xml",
13
            stylesheetLocation: "bug_demo.sef"
14
          });
15
          }     
16
    </script>
17
  </head>
18
  <body style="padding: 2em;">
19
    <h1>demo of what might be a bug in Saxon-JS 1.2.0</h1>
20
    <h2>2020-04-18 by Syd Bauman</h2>
21
    <p>The associated XSLT program (<tt>./bug_demo.xslt</tt>) is
22
    supposed to return a single number (cast to a string for output).
23
    It counts how many “words” there are in the input document, where
24
    a word is defined as a whitespace-separated token in the content
25
    of a particular XPath, that is not a descendant of one of a
26
    handful of ignorable elements.</p>
27
    <p>If it is being run in the browser, it is supposed to replace
28
    the contents of the paragraph below with the number.</p>
29
    <p id="bug_demo" style="padding: 0em 0em 0em 2em; font-size: x-large;">
30
      <i>[The number goes here]</i>
31
    </p>
32
    <p>If it is being run on the commandline it writes the output to a
33
    file in the /tmp/ directory. (I used &lt;result-document> even
34
    from the commandline just to make them parallel. Probably would
35
    get same results if just written to STDOUT or wherever -o: switch
36
    says.) It always seems to count correctly when run on the
37
    commandline, whether using Saxon 9.9 or 10.0.</p>
38
    <p>In the browser, where I am using Saxon-EE 9.9.1.5 to generate a
39
    SEF file from the program, is a different story. When I started
40
    writing this it did not seem to matter how big the input document
41
    (<tt>../bug_demo.xml</tt>) was, I got the same <tt
42
    style="color:red;">XError: looping???</tt> error from Saxon in the
43
    browser when it was over 11,000 words long or 11 words long. But
44
    later the looping error started to appear only with large input
45
    documents. (At ≤ 1002 counted words it works fine, at ≥ 1003 it
46
    fails.) The only difference I can think of between “sooner” and
47
    “later” is that I re-launched Firefox, and thus updates I received
48
    over the last N days via Ubuntu Software Updater may have been
49
    applied, idunno. (I also probably made some code tweaks that I
50
    thought would not effect anything, but I don’t even remember if I
51
    did or not, let alone what they might have been.)</p>
52
    <p>I don’t speak Java, but I am very suspicious of this snippet of
53
    code I found on <a
54
    href="https://gist.github.com/mhogerheijde/7051105890dad5b334d95ad671d2c7b8">GitHub</a>:</p>
55
    <pre>            if (loops++ > 1000) {
56
                throw XError("looping???");
57
            }</pre>
58
    <p>It seems at first blush as if the error is flagged the first
59
    time the <tt>$all_content</tt> variable is referenced, whether
60
    or not it is processed step-by-step:</p>
61
    <pre>      &lt;xsl:variable name="ac_normalized" select="normalize-space($all_content)"/>
62
      &lt;xsl:variable name="all_tokens" select="tokenize( $ac_normalized,'&amp;#x20;')"/>
63
      &lt;xsl:variable name="num_tokens" select="count( $all_tokens )"/></pre>
64
    <p>or all-at-once: <tt>&lt;xsl:variable name="num_words" select="normalize-space( $all_content ) ! tokenize('&amp;#x20;') => count()"/></tt>.</p>
65
    <h2>Other issues</h2>
66
    <p>It also does not seem to matter whether or not I use a
67
    predicate. The predicate, <tt>[ matches( .,'\p{L}')]</tt>, was in
68
    my original code just to get rid of tokens like “—” or “!!” that
69
    should not be counted as words for my application. (It is a poor
70
    approximation of what I really want, because “Sept. 11” should
71
    count as two words, not one.) However, using the predicate seems
72
    to make it more likely that I get <tt>Synchronous XMLHttpRequest
73
    on the main thread is deprecated because of its detrimental
74
    effects to the end user’s experience. For more help
75
    http://xhr.spec.whatwg.org/</tt>. I have no idea what that warning
76
    means or what, if anything, to do about it. And even less after
77
    going to the recommended web page.</p>
78
    <p>I am also wondering if there is a <em>right</em> way to ask
79
    “am I being run as a SEF in a browser?”. I found that
80
    <tt>function-available('ixsl:location')</tt> does the trick,
81
    but it seems a bit clumsy.</p>
82
    <p>I also wonder if there is any way to use &lt;result-document>
83
    to write to a disk file when running as a SEF in the browser. The
84
    <a
85
    href="http://www.saxonica.com/saxon-js/documentation/index.html#!development/result-documents">documentation</a>
86
    implies the answer is “no”, but I am hopeful, as it could make
87
    debugging a lot easier.</p>
88
  </body>
89
</html>
(1-1/3)