Project

Profile

Help

Bug #6207

closed

Filter expression not working properly

Added by Johan Gheys 7 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
2023-09-22
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
11, 12, trunk
Fix Committed on Branch:
11, 12, trunk
Fixed in Maintenance Release:
Platforms:
.NET, Java

Description

I have tried to make as simple a test case as possible (see attachment), with both Saxon-EE 11.6 and 12.3 generating the following output:

Batch execution starting...
count($line) = 1
count($line[@id = 1]) = 1
count($line[@id = (1, 1)]) = 1
*** Processing 2023-01-01
*** count($line) = 1
*** count($line[@id = 1]) = 1
*** count($line[@id = (1, 1)]) = 2
*** count($line[@id = (1, 1)]/.) = 1
Batch execution succeeded
Execution time: 0.209s

Note that count($line[@id = (1, 1)]) suddenly becomes equal to 2, depending on the nested level.


Files

filter-expression.zip (2.01 KB) filter-expression.zip Johan Gheys, 2023-09-22 21:59
Actions #1

Updated by Martin Honnen 7 months ago

Both 11.6 and 12.3 EE with -opt:-x give count($line[@id = (1, 1)]) = 1. So the "index predicates" optimization is probably the culprit.

Actions #2

Updated by Martin Honnen 7 months ago

-explain output parts containing indexed filter expressions (when running 12.3 with all optimizations):

OPT : Created indexed filter expression:
OPT : Expression after rewrite: IndexedFilterExpression($lines, atomizeSingleton(.?date), atomizeSingleton($Q{http://saxon.sf.net/generated-variable}current98224136))
OPT : At line 34 of file:///C:/Users/marti/Downloads/filter-expression/xslt/batch.xslt
OPT : Created indexed filter expression:
OPT : Expression after rewrite: IndexedFilterExpression($line, ((.) treat as node())/data(@id), 1)
OPT : At line 35 of file:///C:/Users/marti/Downloads/filter-expression/xslt/batch.xslt
OPT : Created indexed filter expression:
OPT : Expression after rewrite: IndexedFilterExpression($line, ((.) treat as node())/data(@id), (1, 1))
OPT : At line 36 of file:///C:/Users/marti/Downloads/filter-expression/xslt/batch.xslt
OPT : Created indexed filter expression:
OPT : Expression after rewrite: IndexedFilterExpression($line, ((.) treat as node())/data(@id), (1, 1))
OPT : At line 32 of file:///C:/Users/marti/Downloads/filter-expression/xslt/batch.xslt
Actions #3

Updated by Michael Kay 7 months ago

There's a pretty complex sequence of optimizations going on here!

Line 29, the one that's working, is evaluated as a regular filter expression (not an IndexedFilterExpression):

                         <message line="29">
                            <fn role="select" name="concat">
                              <sequence>
                                <str val="count($line[@id = (1, 1)]) = "/>
                                <fn name="count">
                                  <filter flags="b">
                                    <varRef name="Q{}line" slot="2"/>
                                    <gcEE op="="
                                          card="N:1"
                                          comp="GAC|http://www.w3.org/2005/xpath-functions/collation/codepoint">
                                      <literal count="2">
                                        <int val="1"/>
                                        <int val="1"/>
                                      </literal>
                                      <slash>
                                        <treat as="N" diag="14|2|XPTY0020|">
                                          <dot/>
                                        </treat>
                                        <attVal name="Q{}id"/>
                                      </slash>
                                    </gcEE>
                                  </filter>
                                </fn>
                              </sequence>
                            </fn>
                            <str role="terminate" val="no"/>
                            <str role="error" val="Q{http://www.w3.org/2005/xqt-errors}XTMM9000"/>
                          </message>

It's not obvious why that should be the case, but it explains why that case is working and line 35 isn't.

It's also fairly clear why line 36 is working: the "/." has forced elimination of duplicates.

I'm wondering if there is a basic flaw in IndexedFilterExpression, in that an item that matches the predicate in more than one way will be included more than once. Perhaps the code is relying on some outer expression eliminating any duplicates, and it just happens that in this example there is no outer expression that does that?

Actions #4

Updated by Michael Kay 7 months ago

It seems that if we decided at compile time to use an index, then at run-time, if the size of the collection to be searched is less than 8, we don't actually build an index, we do a serial search. The class that implements this serial search is a SmallSearchableValue, and it appears that if we are looking for more than one value in a search of this class, then duplicate hits are not eliminated. (Confirmed by a temporary mod to SearchableValue line 56, to create an IndexedValue unconditionally).

As well as including duplicates, it seems that the results could be in the wrong order.

Actions #5

Updated by Michael Kay 7 months ago

IndexedValue.findItems() handles this by building a Set<Integer> containing the index positions of all matching items; if there are multiple search items, then each one results in additions to this set of integers; when all the search items have been processed, the method returns an iterator that filters the list with this set of index positions. This logic is missing from SmallSearchableValue.findItems().

For SmallSearchableValue.findItems() the logic is simpler, because it tests each input item in turn against all the search conditions. The only change needed is to avoid adding it to the result set more than once if it matches more than one of the conditions. The nice way to do this would be to avoid building a result list in memory and instead return a MappingIterator...

Note: the logic of findItems() is complicated by the presence of a parameter firstOnly which provides a fast path if only the first item is required. However, this parameter is always set to false, so this logic can be eliminated.

Actions #6

Updated by Michael Kay 7 months ago

  • Status changed from New to Resolved
  • Applies to branch 11, 12, trunk added
  • Fix Committed on Branch 11, 12, trunk added
  • Platforms .NET, Java added

The reason the second occurrence of $line[@id = (1, 1)] is turned into an indexed filter expression and the first is not turns out to be because we don't create an indexed expression for a filter applied to a local variable reference unless the expression is likely to be executed more than once in the life of the local variable. This is true for the second expression (which is within a loop), but not for the first. The fact that the expression can be pulled out of the loop makes no difference.

Actions #7

Updated by Johan Gheys 7 months ago

Thanks Michael for the quick fix and clear explanation.

Actions #8

Updated by O'Neil Delpratt 5 months ago

  • Fixed in Maintenance Release 12.4 added

Bug fix applied in the Saxon 12.4 Maintenance release. Leaving it marked as 'Resolved' until fix applied on Saxon 11.

Please register to edit this issue

Also available in: Atom PDF