Project

Profile

Help

Bug #4430

closed

Unpredictable output when streaming accumulators call other accumulators

Added by Martin Honnen over 4 years ago. Updated about 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Streaming
Sprint/Milestone:
-
Start date:
2020-01-16
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
9.9, trunk
Fix Committed on Branch:
9.9, trunk
Fixed in Maintenance Release:
Platforms:

Description

I have run into an odd problem, I have written an XSLT 3 stylesheet using accumulators and streaming that gives the wanted result when running with 9.9.1.6 EE Java from the command line as long as I don't use the -t option to display timing information (well, I rather use it to see whether the input is really processed with streaming).

The XSLT sample is

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:mf="http://example.com/mf"
    exclude-result-prefixes="#all" version="3.0">

    <xsl:output method="text" item-separator="&#10;"/>

    <xsl:mode on-no-match="shallow-skip" use-accumulators="#all" streamable="yes"/>

    <xsl:accumulator name="A" as="xs:integer?" initial-value="()" streamable="yes">
        <xsl:accumulator-rule match="fields/data/id1" phase="end"
            select="accumulator-after('values') => subsequence(1, 3) => sum()"/>
    </xsl:accumulator>

    <xsl:accumulator name="B" as="xs:integer?" initial-value="()" streamable="yes">
        <xsl:accumulator-rule match="fields/data/id1" phase="end"
            select="accumulator-after('values') => subsequence(4, 3) => sum()"/>
    </xsl:accumulator>

    <xsl:accumulator name="C" as="xs:integer?" initial-value="()" streamable="yes">
        <xsl:accumulator-rule match="fields/data/id2" phase="end"
            select="accumulator-after('values') => subsequence(1, 3) => sum()"/>
    </xsl:accumulator>

    <xsl:accumulator name="D" as="xs:integer?" initial-value="()" streamable="yes">
        <xsl:accumulator-rule match="fields/data/id2" phase="end"
            select="accumulator-after('values') => subsequence(4, 3) => sum()"/>
    </xsl:accumulator>

    <xsl:accumulator name="values" as="xs:integer*" initial-value="()" streamable="yes">
        <xsl:accumulator-rule match="data" select="()"/>
        <xsl:accumulator-rule match="data/id1/text() | data/id2/text()"
            select="
                analyze-string(., '.')//*:match ! (
                if (. = '.')
                then
                    15
                else
                    if (. = ' ')
                    then
                        20
                    else
                        xs:integer(.))"
        />
    </xsl:accumulator>

    <xsl:accumulator name="remainder" as="xs:integer?" initial-value="()" streamable="yes">
        <xsl:accumulator-rule match="fields/data" select="()"/>
        <xsl:accumulator-rule match="fields/data" phase="end"
            select="
                abs((accumulator-after('B') + accumulator-after('D')) -
                (accumulator-after('A') + accumulator-after('C'))) mod 13"
        />
    </xsl:accumulator>

    <xsl:accumulator name="remainder-sum" as="xs:integer?" initial-value="()" streamable="yes">
        <xsl:accumulator-rule match="fields" select="0"/>
        <xsl:accumulator-rule match="fields/data" phase="end" select="$value + accumulator-after('remainder')"/>
    </xsl:accumulator>

    <xsl:output method="text" item-separator="&#10;"/>

    <xsl:template match="fields">
        <xsl:apply-templates/>
        <xsl:sequence select="accumulator-after('remainder-sum')"/>
    </xsl:template>

</xsl:stylesheet>

An input sample is

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <fields>
        <data>
            <id1>111   </id1>
            <id2>123.50</id2>
        </data>        
    </fields>
    <fields>
        <data>
            <id1>111   </id1>
            <id2>123.50</id2>
        </data>
        <data>
            <id1>321.50</id1>
            <id2>   222</id2>
        </data>
    </fields>
</root>

When I run it from the command line solely with options -xsl:sheet.xsl -s:input.xml then the result is fine:

6
7

or when I use the -o:result.txt I get that result in the named result file.

However, when using the -t option, I get the timing and streaming infos output to the console but the two numbers are not output to the console or, with the -o option, an empty file is created.

The code tries to solve https://stackoverflow.com/q/59756080/252228 with streaming.

Actions #1

Updated by Debbie Lockett over 4 years ago

Running in 9.9 branch project. Without -o option, get no console output with or without -t. But get console output for -T.

Using saxon:trace on accumulators. Without -T get output ending:

values BEFORE /root/fields/data: ()
remainder BEFORE /root/fields/data: ()
values BEFORE /root/fields/data/id1/text():  (3, 2, 1, ... [6])
B AFTER /root/fields/data/id1: 20
A AFTER /root/fields/data/id1: 6
values BEFORE /root/fields/data/id2/text():  (20, 20, 20, ... [6])
D AFTER /root/fields/data/id2: 6
C AFTER /root/fields/data/id2: 60
remainder-sum AFTER /root/fields/data: ()
remainder AFTER /root/fields/data: 1

With -T:

remainder BEFORE /root/fields/data: ()
values BEFORE /root/fields/data: ()
values BEFORE /root/fields/data/id1/text():  (3, 2, 1, ... [6])
A AFTER /root/fields/data/id1: 6
B AFTER /root/fields/data/id1: 20
values BEFORE /root/fields/data/id2/text():  (20, 20, 20, ... [6])
D AFTER /root/fields/data/id2: 6
C AFTER /root/fields/data/id2: 60
remainder AFTER /root/fields/data: 1
remainder-sum AFTER /root/fields/data: 7
Actions #2

Updated by Debbie Lockett over 4 years ago

remainder-sum depends on remainder (select="$value + accumulator-after('remainder')") so should be evaluated afterwards. But the trace suggests this isn't happening (without -T). Perhaps the order is actually random?

Actions #3

Updated by Debbie Lockett over 4 years ago

  • Status changed from New to In Progress

The last paragraph of 18.2.3 https://www.w3.org/TR/xslt-30/#accumulator-informal-rules suggests dependencies between accumulators should be handled, but there doesn't seem to be any logic in the product to do this (at least in the streaming case). AccumulatorRegistryEE.registerSelectedAccumulators simply stores the accumulators as a set, and therefore processes them in effectively random order.

Leaving this open until attempting the design changes.

Actions #4

Updated by Michael Kay over 4 years ago

Thinking about how we might do this. In the worst case, dependencies among accumulators cannot be determined statically, because the argument to accumulator-before() or -after() is not required to be a string literal (moreover, the call to accumulator-before() or -after() can be dynamically invoked via a chain of function or template invocations). So one approach would be to evaiuate all the accumulators associated with a particular streaming event (logically) in parallel, with each thread waiting if it needs the value of an accumulator that is not yet available; and ideally some kind of deadlock detection (though the spec does say that "catastrophic failure" is allowed in the event of a cyclic dependency, which would permit an undetected deadlock in principle).

Wouldn't it be nice to have co-routines and a yield construct!

Short of a parallel implementation, another approach would be to choose an arbitrary order, and have an accumulator evaluation throw an exception if its dependencies are not satisfied; it would then go to the back of a queue for trying again. We could possibly optimize for the common case where the dependencies ARE known statically.

Actions #5

Updated by Michael Kay over 4 years ago

Further thoughts: rather than having each AccumulatorWatch called directly from the WatchManager, we could pass all the relevant events via a new AllAccumulatorsWatch class, which would activate each AccumulatorWatch in turn, allowing any of them to invoke the others using a recursive call (with a check to prevent reentrancy). The AllAccumulatorsWatch would start each node/phase event by marking all accumulators as unavailable (for that phase), and a request for an accumulator that is not available would trigger a recursive call to process that accumulator.

Actions #6

Updated by Michael Kay over 4 years ago

Added streaming and non-streaming versions of this test to XSLT3 test suite as accumulator-079[s].

Actions #7

Updated by Michael Kay about 4 years ago

  • Subject changed from XSLT 3 sample using streaming and accumulators gives no stylesheet result output when -t option is used to Unpredictable output when streaming accumulators call other accumulators
Actions #8

Updated by Michael Kay about 4 years ago

I've implemented a MultiAccumulatorWatch to the main WatchManager now issues a single call for each event; the MultiAccumulatorWatch is responsible for distributing this to each individual AccumulatorWatch. This is working and seems a useful step forward.

In doing this I've noticed that AccumulatorWatch.getPreDescentValue() already has logic to allow for someone trying to get the BEFORE value before it has been computed; if the current node hasn't yet been notified then it gets processed now, before returning the value. There is no similar logic for getPostDescentValue(); perhaps that's all that's needed.

Actions #9

Updated by Michael Kay about 4 years ago

I'm now handling post-descent accumulator recursion the same way as pre-descent: if the accumulator value is requested, and we haven't yet seen the endElement event for the relevant node, then we fire the endElement action for the node, and note it as having been processed, so when the event is subsequently notified by the normal route, we ignore it.

The new design is working well, and seems sufficiently robust that I now intend to retrofit it as a patch to 9.9.

Actions #10

Updated by Michael Kay about 4 years ago

  • Status changed from In Progress to Resolved
  • Assignee set to Michael Kay
  • Applies to branch trunk added
  • Fix Committed on Branch 9.9, trunk added
Actions #11

Updated by Michael Kay about 4 years ago

  • Status changed from Resolved to In Progress

The fix caused regression: test evaluate-046 is faillng when evaluation of the AFTER value for an accumulator makes use of the BEFORE value.

I've fixed this on the development branch by making the test for cyclicity more selective, but I think it may still not be good enough; we should allow, for example, the computation of the BEFORE value for an accumulator to access the BEFORE value on an ancestor node.

Actions #12

Updated by Michael Kay about 4 years ago

Also, the new test for cyclic accumulators (accumulator-080) crashes with stack overflow when bytecode is enabled. This is not technically a non-conformance (the spec explicitly allows "catastrophic failure" for this condition, just as with infinite function recursion) but it would be nice to do better.

Actions #13

Updated by Michael Kay about 4 years ago

The patch caused the saxon:capture feature on accumulators to stop working: this has now been fixed.

The problem was that saxon:capture adds a Watch which gathers a snapshot of a streamed node for use when the phase=end rule is evaluated; but the normal snapshot mechanism copies accumulator values, which must therefore already be available before the snapshot is completed. I don't actually know why this didn't cause a problem before. But I've fixed it by adding an option to streaming snapshot feeds to avoid copying accumulators, and setting this option for these internal snapshots.

Actions #14

Updated by Michael Kay about 4 years ago

  • Status changed from In Progress to Resolved

I'm going to close this now for tidiness as we're about to produce a maintenance release in the current state. There are comments in this bug entry suggesting further tests that need to be written; if those reveal further problems we'll open a new issue.

Actions #15

Updated by O'Neil Delpratt about 4 years ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 9.9.1.7 added

Patch applied in the 9.9.1.7 maintenance release.

Please register to edit this issue

Also available in: Atom PDF