Project

Profile

Help

Bug #4444

Problem with count(//parent::node()) and schematron comparison operators

Added by Jordan Padams 2 months ago. Updated about 1 month ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Internals
Sprint/Milestone:
-
Start date:
2020-01-28
Due date:
% Done:

100%

Legacy ID:
Applies to branch:
9.5, 9.9, trunk
Fix Committed on Branch:
9.9, trunk
Fixed in Maintenance Release:

Description

Beginning with Saxon HE 9.5.1-4, the library no longer properly handles the use of count(//parent::node()) properly when doing comparisons in schematron.

Oddly enough, if you output the value in the assert using value-of, it is successful. For example:

This asserts:

  <sch:pattern>
    <sch:rule context="sp:Bin_Center_Lookup/sp:bin_center_field_name">
      <sch:let name="cnt2" value="count(//parent::node())" />
      <sch:assert test="$cnt2 eq 379">
        Test1 No output count</sch:assert>
    </sch:rule>
  </sch:pattern>

While this works as expected:

  <sch:pattern>
    <sch:rule context="sp:Bin_Center_Lookup/sp:bin_center_field_name">
      <sch:let name="cnt2" value="count(//parent::node())" />
      <sch:assert test="$cnt2 eq 379">
        Test2 Output count: <sch:value-of select="$cnt2"/></sch:assert>
    </sch:rule>
  </sch:pattern>

Example files for validation attached.

We may just be somehow calling the schematron incorrectly from our libraries, but this appears to be the only place where we are running into issues.

PDS4_SP_TEST.sch (1.9 KB) Test schematron Jordan Padams, 2020-01-28 21:21 PDS4_SP_TEST.sch
SP-Test1-VALID.xml (33.1 KB) Test XML Jordan Padams, 2020-01-28 21:21 SP-Test1-VALID.xml

History

#1 Updated by Michael Kay 2 months ago

When I execute the XPath query count(//parent::node()) directly against your source document, I get the answer 379, which is what you appear to be expecting.

If schematron is getting a different answer, then that must be because it is doing something different. I'm afraid we can't debug schematron code for you. You haven't even told us which schematron processor you are using. Is it one that generates XSLT? If you can send us the XSLT and convince us that a known XSLT stylesheet is not producing the correct output, then that's something we can investigate.

Have you checked any variations on the expression such as number($cnt2) eq 379 or $cnt = 379 or $cnt = '379'? One possible explanation, if $cnt displays as "379" but doesn't compare "eq" to 379, is that it's a string rather than a number. Without knowing schematron, I've no idea if that's a real possibility.

#2 Updated by Michael Kay 2 months ago

  • Status changed from New to In Progress

#3 Updated by Michael Kay 2 months ago

I have generated the XSLT stylesheet using the "skeleton" schematron implementation and am running the XSLT directly against the source document, which demonstrates that this does appear to be a Saxon problem. The XPath expression count(//parent::node()) is evaluated successfully, but there seems to be a problem with the variable $cnt2. Adding xsl:message output to show the variable value causes everything to run fine; this is usually an indication of some incorrect optimisation rewrite, which doesn't happen when an extra reference to the variable is added.

#4 Updated by Michael Kay 2 months ago

In the absence of a second reference to the variable, the variable has been inlined, giving an expression like

if test="count(//parent::node()) = 379" then () else <svrl:failed-assert..../>

This then triggers another optimization, where count(X) = N is replaced by count(subsequence(X, 1, N+1)) = N. The idea of this optimization is to avoid evaluating more items in the expression than are needed to establish whether or not the sequence size is 379.

The debugger shows that the actual length of the subsequence() is 380.

It looks to me as if the expression isn't eliminating duplicates. Clearly the naive nested-loop evaluation strategy for a descendant/parent path expression is going to include duplicates that need to be eliminated, but there doesn't appear to be any code on the expression tree to do this.

#5 Updated by Michael Kay 2 months ago

ValueComparison.optimizeCount() is responsible for the rewrite of the count(X) = N expression, and it starts by doing

sequence = sequence.unordered(false, false);

which is done to avoid unnecessary sorting of the input sequence (the order of the sequence does not affect its size). The first argument to this call is a flag retainAllNodes which should be set to true, indicating that although the order of the sequence does not matter, the presence or absence of duplicates does.

Setting this argument to true appears to solve the problem.

#6 Updated by Michael Kay 2 months ago

Incidentally, a more efficient query to count the number of nodes that have children is probably

count(//node()[child::node()])

Writing it this way doesn't involve any sorting or elimination of duplicates.

#7 Updated by Michael Kay 2 months ago

  • Category set to Internals
  • Status changed from In Progress to Resolved
  • Assignee set to Michael Kay
  • Applies to branch 9.9, trunk added
  • Fix Committed on Branch 9.9, trunk added

Fixed on the 9.9 and development branches. I'm not planning to provide a fix for earlier releases.

#8 Updated by Jordan Padams 2 months ago

Thanks so much! Is there a plan for a new maintenance release in the near-term?

#9 Updated by Michael Kay 2 months ago

We base the decision on the number of outstanding patches and their severity. There are currently 12 outstanding patches for 9.9 and the rate has slowed to about 6 a month, so I think it's likely to be 4-6 weeks before we decide another maintenance release is due. Suggest you try the workaround in comment #6 - which is a better way to write the query anyway.

#10 Updated by O'Neil Delpratt about 1 month ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 9.9.1.7 added

Patch applied in the 9.9.1.7 maintenance release.

Please register to edit this issue

Also available in: Atom PDF