Project

Profile

Help

Bug #4738

closed

Optimization bug: xsl:choose with branches uniformly comparing untypedAtomic to numeric

Added by Marek Skorek over 3 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
Internals
Sprint/Milestone:
-
Start date:
2020-09-15
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
10, 9.9, trunk
Fix Committed on Branch:
10, 9.9, trunk
Fixed in Maintenance Release:
Platforms:

Description

Hi,

We tried to upgrade our Saxon 9.7 library to 9.9. After a few hours it turned out that some tranformations does not work as expected. A result of such a transformation was a totally different output file.

We tried to reconfigure Saxon by disabling optimizations (setting http://saxon.sf.net/feature/optimizationLevel feature to 0). With this configuration it worked as before (in 9.7).

I attached test transformation (test.xslt), an input file (input-optimization-level-test.xml) and two output files for the optimization level set to 0 and 9 respectively (output-optimization-level-0.xml, output-optimization-level-9.xml)

Here comes some questions which need to be answered in order to be sure that new library is safe to be deployed to a production environment:

  1. How is that possible that optimization changes transformation behaviour?
  2. What else changed that could break transformations which works in 9.7?
  3. Do you have any suggestions how to deploy the new version safely?

Regards, Marek.


Files

input-optimization-level-test.xml (127 Bytes) input-optimization-level-test.xml Marek Skorek, 2020-09-15 15:20
output-optimization-level-0.xml (304 Bytes) output-optimization-level-0.xml Marek Skorek, 2020-09-15 15:20
output-optimization-level-9.xml (266 Bytes) output-optimization-level-9.xml Marek Skorek, 2020-09-15 15:20
test.xslt (1.38 KB) test.xslt Marek Skorek, 2020-09-15 15:20
Actions #1

Updated by Michael Kay over 3 years ago

Thank you for reporting it. As you have found out, this is caused by a bug in the optimizer.

In fact it's caused by two separate optimizations that interfere with each other. The first of these is to use an optimized comparison of untypedAtomic values to numerics that avoids the full cost of number-to-string conversion; the second is to use a hash table lookup for the choose expression because the branches are homogenous and disjoint. Both of these optimizations are known to give very worthwhile performance improvements for some workloads.

Your easiest workaround is to replace the numeric literals in the @test expressions with string literals, that is, change @SaleLock = 1 to @SaleLock = '1'. That's assuming you don't need to match alternative representations like '01' or '+1', in which case you could do the conversion explicitly.

I'm afraid the answers to your questions are somewhat banal:

(1) Bugs happen. Optimization is particularly prone to bugs, because it makes the code less orthogonal and therefore increases the number of test cases needed combinatorially. This bug is very typical of optimization bugs: two conditions which might each be present in 1 out of 50 stylesheets are present in combination only in 1 out of 2500 stylesheets. Although we run hundreds of thousands of test cases, most of them are designed to test features in isolation rather than in combination.

(2) Other bugs. Although 9.9.x is now very stable indeed, and 10.x is also becoming quite stable, in our experience it's not that unusual for a bug to survive in the product for five years before anyone detects it.

(3) Testing. It's a really good idea to have a good test suite for your own stylesheets. A really comprehensive set of XSpec tests would be ideal; but as a minimum, a test suite that simply tests each stylesheet against a small number of input documents and compares the results with expected results to see they haven't changed.

Unless you indicate otherwise, we will take this test case (thanks for packaging it up so cleanly) and add it to the W3C XSLT3 test suite which we run for continuous testing to prevent regressions.

Actions #2

Updated by Marek Skorek over 3 years ago

Thanks you the answer Micheal,

Your easiest workaround is to replace the numeric literals in the @test expressions with string literals, that is, change @SaleLock = 1 to @SaleLock = '1'. That's assuming you don't need to match alternative representations like '01' or '+1', in which case you could do the conversion explicitly.

It is hard to check all the transformations because our integrating team created more than 3,5k different ones. Some of them have side effects or create dynamic data (i.e. timestamps) which makes them hard to compare.

Nevertheless, I am considering to disable optimization by default and enable it only for transfromations with a poor performance. Is it a good way?

Unless you indicate otherwise, we will take this test case (thanks for packaging it up so cleanly) and add it to the W3C XSLT3 test suite which we run for continuous testing to prevent regressions.

Feel free to include it as a test case ;-) I will wait for a bugfix.

Actions #3

Updated by Michael Kay over 3 years ago

  • Subject changed from Upgrading Saxon to 9.9 breaks working transformations to Optimization bug: xsl:choose with branches uniformly comparing untypedAtomic to numeric
  • Category set to Internals
  • Assignee set to Michael Kay
  • Applies to branch 10, trunk added
  • Fix Committed on Branch 10, 9.9, trunk added

Fix committed on 9.9, 10, and development branches.

(Although I've committed a fix for 9.9, we're no longer producing routine maintenance releases on that branch: 10.x is now recommended).

The fix ensures that the switch operand is converted from untyped atomic to numeric in this scenario.

W3C test case choose-0202 created.

Actions #4

Updated by Michael Kay over 3 years ago

  • Status changed from New to Resolved

Nevertheless, I am considering to disable optimization by default and enable it only for transfromations with a poor performance. Is it a good way?

I don't think that would significantly reduce your risk of hitting a previously unknown bug. Given that 9.9 has been out in the field for just short of two years and is used by tens of thousands of users, the chance of hitting any bug is very low. We run far more tests with optimization enabled than with it disabled, and your best chance of avoiding under-exercised paths has to be to default as many options as possible.

Upgrading to a new version of any software library when you have no regression tests for your application is something that would worry me a little, though.

Actions #5

Updated by O'Neil Delpratt over 3 years ago

  • Fixed in Maintenance Release 9.9.1.8 added

Bug fix applied on the Saxon 9.9.1.8 maintenance release. Leaving open until applied on the Saxon 10 maintenance release.

Actions #6

Updated by O'Neil Delpratt over 3 years ago

Bug fix applied in the Saxon 10.3 maintenance release

Actions #7

Updated by O'Neil Delpratt over 3 years ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 10.3 added

Please register to edit this issue

Also available in: Atom PDF