Bug #6207
closed
Filter expression not working properly
Applies to branch:
11, 12, trunk
Fix Committed on Branch:
11, 12, trunk
Fixed in Maintenance Release:
Description
I have tried to make as simple a test case as possible (see attachment), with both Saxon-EE 11.6 and 12.3 generating the following output:
Batch execution starting...
count($line) = 1
count($line[@id = 1]) = 1
count($line[@id = (1, 1)]) = 1
*** Processing 2023-01-01
*** count($line) = 1
*** count($line[@id = 1]) = 1
*** count($line[@id = (1, 1)]) = 2
*** count($line[@id = (1, 1)]/.) = 1
Batch execution succeeded
Execution time: 0.209s
Note that count($line[@id = (1, 1)]) suddenly becomes equal to 2, depending on the nested level.
Files
Both 11.6 and 12.3 EE with -opt:-x
give count($line[@id = (1, 1)]) = 1
. So the "index predicates" optimization is probably the culprit.
-explain
output parts containing indexed filter expressions (when running 12.3 with all optimizations):
OPT : Created indexed filter expression:
OPT : Expression after rewrite: IndexedFilterExpression($lines, atomizeSingleton(.?date), atomizeSingleton($Q{http://saxon.sf.net/generated-variable}current98224136))
OPT : At line 34 of file:///C:/Users/marti/Downloads/filter-expression/xslt/batch.xslt
OPT : Created indexed filter expression:
OPT : Expression after rewrite: IndexedFilterExpression($line, ((.) treat as node())/data(@id), 1)
OPT : At line 35 of file:///C:/Users/marti/Downloads/filter-expression/xslt/batch.xslt
OPT : Created indexed filter expression:
OPT : Expression after rewrite: IndexedFilterExpression($line, ((.) treat as node())/data(@id), (1, 1))
OPT : At line 36 of file:///C:/Users/marti/Downloads/filter-expression/xslt/batch.xslt
OPT : Created indexed filter expression:
OPT : Expression after rewrite: IndexedFilterExpression($line, ((.) treat as node())/data(@id), (1, 1))
OPT : At line 32 of file:///C:/Users/marti/Downloads/filter-expression/xslt/batch.xslt
There's a pretty complex sequence of optimizations going on here!
Line 29, the one that's working, is evaluated as a regular filter expression (not an IndexedFilterExpression):
<message line="29">
<fn role="select" name="concat">
<sequence>
<str val="count($line[@id = (1, 1)]) = "/>
<fn name="count">
<filter flags="b">
<varRef name="Q{}line" slot="2"/>
<gcEE op="="
card="N:1"
comp="GAC|http://www.w3.org/2005/xpath-functions/collation/codepoint">
<literal count="2">
<int val="1"/>
<int val="1"/>
</literal>
<slash>
<treat as="N" diag="14|2|XPTY0020|">
<dot/>
</treat>
<attVal name="Q{}id"/>
</slash>
</gcEE>
</filter>
</fn>
</sequence>
</fn>
<str role="terminate" val="no"/>
<str role="error" val="Q{http://www.w3.org/2005/xqt-errors}XTMM9000"/>
</message>
It's not obvious why that should be the case, but it explains why that case is working and line 35 isn't.
It's also fairly clear why line 36 is working: the "/." has forced elimination of duplicates.
I'm wondering if there is a basic flaw in IndexedFilterExpression
, in that an item that matches the predicate in more than one way will be included more than once. Perhaps the code is relying on some outer expression eliminating any duplicates, and it just happens that in this example there is no outer expression that does that?
It seems that if we decided at compile time to use an index, then at run-time, if the size of the collection to be searched is less than 8, we don't actually build an index, we do a serial search. The class that implements this serial search is a SmallSearchableValue
, and it appears that if we are looking for more than one value in a search of this class, then duplicate hits are not eliminated. (Confirmed by a temporary mod to SearchableValue
line 56, to create an IndexedValue
unconditionally).
As well as including duplicates, it seems that the results could be in the wrong order.
IndexedValue.findItems()
handles this by building a Set<Integer>
containing the index positions of all matching items; if there are multiple search items, then each one results in additions to this set of integers; when all the search items have been processed, the method returns an iterator that filters the list with this set of index positions. This logic is missing from SmallSearchableValue.findItems()
.
For SmallSearchableValue.findItems()
the logic is simpler, because it tests each input item in turn against all the search conditions. The only change needed is to avoid adding it to the result set more than once if it matches more than one of the conditions. The nice way to do this would be to avoid building a result list in memory and instead return a MappingIterator...
Note: the logic of findItems()
is complicated by the presence of a parameter firstOnly
which provides a fast path if only the first item is required. However, this parameter is always set to false, so this logic can be eliminated.
- Status changed from New to Resolved
- Applies to branch 11, 12, trunk added
- Fix Committed on Branch 11, 12, trunk added
- Platforms .NET, Java added
The reason the second occurrence of $line[@id = (1, 1)]
is turned into an indexed filter expression and the first is not turns out to be because we don't create an indexed expression for a filter applied to a local variable reference unless the expression is likely to be executed more than once in the life of the local variable. This is true for the second expression (which is within a loop), but not for the first. The fact that the expression can be pulled out of the loop makes no difference.
Thanks Michael for the quick fix and clear explanation.
- Fixed in Maintenance Release 12.4 added
Bug fix applied in the Saxon 12.4 Maintenance release. Leaving it marked as 'Resolved' until fix applied on Saxon 11.
- Has duplicate Bug #6421: XQuery "group by" - result data duplication added
Please register to edit this issue
Also available in: Atom
PDF