Project

Profile

Help

Feature #4823

closed

Decimal Precision

Added by Svante Schubert about 4 years ago. Updated almost 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Saxon extensions
Sprint/Milestone:
-
Start date:
2020-11-12
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
10, 9.9
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

I am doing some evaluation if XSLT can be used for testing accuracy on EU e-invoices (EN16931). My simple stylesheet does no longer compile when I update to the latest Saxon version. Both input & XSLT attached and a quick Maven running environment zipped.

In addition, likely an own feature request, I desire the best decimal-based floating-point precision Java can offer for enhancing the existing EU e-invoice Schematron reference implementation - https://github.com/ConnectingEurope/eInvoicing-EN16931. But this parameter I found in 10.3 concerns me a bit: https://github.com/svanteschubert/Saxon-HE/blob/main/src/main/java/net/sf/saxon/value/BigDecimalValue.java#L30 (don't worry, just a temporary fork to show & tell and to be able to debug).

The problem in detail: <xsl:value-of select="($quantity * ($priceAmount div $baseQuantity))" /> results into 333333333.333333333 <xsl:value-of select="($quantity * $priceAmount div $baseQuantity)" /> results into 333333333.333333333333333333

I am desiring https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/math/MathContext.html#DECIMAL128 Would it be possible to provide higher precision by parameter?

Best regards, Svante


Files

xslt-decimal.zip (21.2 KB) xslt-decimal.zip Svante Schubert, 2020-11-12 10:38
PEPPOL-EN16931-UBL-V2.xslt (11.7 KB) PEPPOL-EN16931-UBL-V2.xslt Svante Schubert, 2020-11-12 10:39
Sample121-new.xml (9.19 KB) Sample121-new.xml Svante Schubert, 2020-11-12 10:39
2020-11-16_17-17-25.png (18.5 KB) 2020-11-16_17-17-25.png IEEE 754 - Parameters defining basic format floating-point numbers Svante Schubert, 2020-11-16 17:18
Actions #1

Updated by Michael Kay about 4 years ago

I haven't been able to reproduce the compile error (I'm running from the command line, I haven't tried running your Java code). What are the minimum steps to reproduce it? What is the exact error message, including line number?

As regards decimal precision, Saxon has an extension function to do decimal division with user-specified precision:

https://saxonica.com/documentation/index.html#!functions/saxon/decimal-divide

Actions #2

Updated by Michael Kay about 4 years ago

Note also that the default precision for decimal divide is not directly the value of the static variable BigDecimalValue.DIVIDE_PRECISION, rather it is

Math.max(BigDecimalValue.DIVIDE_PRECISION, A.scale() - B.scale() + BigDecimalValue.DIVIDE_PRECISION);

(see line 851).

I'm afraid I don't recall how this formula was arrived at - it's been like that for a long time.

Actions #3

Updated by Michael Kay about 4 years ago

As regards the compile error, Saxon's error message for a missing attribute on an XSLT element does not take this form. The message it produces is more like "Element must have an @select attribute".

Also, I don't know if this is relevant, but there are several files in your collection named Sample121-new.xml. The one in src/test/resources/ubl21 starts

<Invoice xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2"
 xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2"
 xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2">
	<cbc:CustomizationID>urn:cen.eu:en16931:2017</cbc:CustomizationID>

while the one in generated-resources/xml/xslt starts

<svrl:schematron-output xmlns:svrl="http://purl.oclc.org/dsdl/svrl"
                        xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2"
                        xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2"
                        xmlns:iso="http://purl.oclc.org/dsdl/schematron"

which seems confusing.

Actions #4

Updated by Svante Schubert about 4 years ago

Thank you for your quick guidance.

Indeed, I was mistaken. The JAR works by command line and it turned out the problem is not a problem of the JAR I receive by Maven, but only occurs when I compile the sources given by Maven. https://repo1.maven.org/maven2/net/sf/saxon/Saxon-HE/10.3/Saxon-HE-10.3-sources.jar as https://github.com/svanteschubert/Saxon-HE/tree/main/src/main/java Are they still in synch? The error was: 'Syntax error in ' if (/ubl-invoice:Invoice) then (if (cbc:InvoicedQuantity) then xs:decimal(cbc:InvoicedQuantity) else 1) else (if (cbc:CreditedQuantity) then xs:decimal(cbc:CreditedQuantity) else 1)'.' FATAL ERROR: 'file:/E:/GitHub/einvoice/xslt-decimal/src/main/resources/xsl/PEPPOL-EN16931-UBL-V2.xslt: line 200: Required attribute 'select' is missing.'

I am using the JAR source provided by Maven: https://repo1.maven.org/maven2/net/sf/saxon/Saxon-HE/10.3/

My main question should be solved unless I am in need to debug, I am not in need of the sources. I am nevertheless curious where the error lurks...

I will investigate further if these function will do the trick.

Have a great day, Michael! Svante

Actions #5

Updated by Michael Kay about 4 years ago

This doesn't look like a Saxon error message. I'm wondering if it comes from Xalan? Perhaps when you rebuilt Saxon from source code, you didn't generate the MANIFEST file that causes JAXP to pick it up as the chosen XSLT transformer, and JAXP is running Xalan instead?

Actions #6

Updated by Svante Schubert about 4 years ago

Michael Kay wrote:

This doesn't look like a Saxon error message. I'm wondering if it comes from Xalan? Perhaps when you rebuilt Saxon from source code, you didn't generate the MANIFEST file that causes JAXP to pick it up as the chosen XSLT transformer, and JAXP is running Xalan instead?

That was exactly the problem! A thousand thanks! :-) I copied the META-INF/services/javax.xml.transform.TransformerFactory file from the binary JAR, as it was missing in the META-INF of the sources JAR. Perhaps you like to add JAXP file to the sources JAR to avoid to be molested on the same problem again ;-)

I tried your Saxon function https://saxonica.com/documentation/index.html#!functions/saxon/decimal-divide but it works not on the HE that I aimed for use for our full open-source stack EU e-invoice validation artefact. Hopefully, decimal-based floating-point is somehow available for the opensource stack. ;-)

Actions #7

Updated by Svante Schubert about 4 years ago

Hello Michael,

There are some changes, you might want to overtake.

I updated the pom.xml to the recent library and added a simplified version of the prior XSL transformation as a JUnit test to be able to easily debug Saxon via IDE. I believe you are still using ANT - that's what the manifest claims.

There was this annoying e-commerce test case :

quantity = 1000000000.0 priceAmount = 1.0 baseQuantity = 3

($quantity * ($priceAmount div $baseQuantity)) = (3 *(1.0 div 3 )) = 333333333.333333333333333333

($quantity * $priceAmount div $baseQuantity) (3 * 1.0 div 3 ) = 333333333.3333333333333333333333333333333333 123456789 0123456789012345678901234567890123

Now it has 34 digits all the time according to Java DECIMAL-128 -

333333333.3333333333333333333333333 123456789 0123456789012345678901234

see https://en.wikipedia.org/wiki/Decimal128_floating-point_format

To enable this I did the following:

  1. Fixed by using the highest precision (and different ways to call BigDecimal when to multiply/divide) See https://github.com/svanteschubert/Saxon-HE/commit/68c538a364e8bfd8aa5598077521ad87fb297e88

  2. Disabled the usage of Double as inappropriate for e-commerce: Information: https://github.com/svanteschubert/Saxon-HE#decimal-based-floating-point See https://github.com/svanteschubert/Saxon-HE/commit/fe8ca45c54622b467eb58fbaeae0d3edbe4461c7

  3. When disabling Double the BigDecimal needs to overtake its part and have to be enhanced (and can be simplified) -> see https://github.com/svanteschubert/Saxon-HE/commit/68c538a364e8bfd8aa5598077521ad87fb297e88

  4. As I not only added to the EU e-invoice CEN norm (EN16931) the recommendation of decimal-based usage but also HALF-UP as the default rounding and Saxon does not support AFAIK user functions, I had to overwrite the XPath round() function - you should not overtake this hack ;-)

I have not tested the performance/memory penalty but my primary goal is to add accuracy to the XSLT based validation artifacts of the CEN e-invoice specification. In general, I would prefer accuracy over 10% run-time penalty, this is what Mike Cowlishaw mentioned to me is the usual rate.

Hope I could be of help and thank you again for your quick response, Mike! Svante

Actions #8

Updated by Michael Kay about 4 years ago

Thanks for doing this investigation.

I think that the changes to the precision of decimal arithmetic are probably conformant with the the XPath specification, which leaves many details of decimal arithmetic implementation-defined. We would need to run all the tests to see if it has any adverse impacts on conformance. Testing for backwards compatibility effects is more difficult because we don't have many tests for aspects of the specification that are implementation-defined. In theory we can also make it configurable, though I'm reluctant because that adds a lot of complexity and a lot of test cases.

The change to using decimal rather than double for numeric literals is however non-conformant and also breaks backwards compatibility. That's not a change we can contemplate. If you want 1.5e0 treated as xs:decimal, you need to write xs:decimal('1.5e0').

If you want to customise functions like round() then I strongly recommend writing your own functions rather than modifying the standard functions. You can implement your own functions even in Saxon-HE by writing them as "integrated extension functions".

We have no plans to change the current policy of differentiating Saxon-PE from -HE, under which Saxon extensions and extensibility mechanisms are generally available only in the commercial product. This policy has proved highly successful in generating a revenue stream that enables us to continue development both for the 10% of users who pay for the product and for the 90% who use the free version. Everyone benefits.

Actions #9

Updated by Svante Schubert about 4 years ago

Thanks again, for your feedback and guidance.

I was not aware of the "integrated extension functions" nor on the numeric literals. What you say is all reasonable and will investigate more in this area.

Have a nice weekend, Svante

PS: Sorry, for the typos, for instance, the example should be of course the following:

 $quantity * ($priceAmount div $baseQuantity)) = (1000000000.0 *(1.0 div 3 )) = 333333333.3333333333333333333333333 
($quantity *  $priceAmount div $baseQuantity)  = (1000000000.0 * 1.0 div 3 )  = 333333333.3333333333333333333333333 
Actions #10

Updated by Svante Schubert about 4 years ago

You might want to consider to change the default type of floating-point on XSLT versions:

  • XSLT 1 & 2 using binary-based floating-point as XSLT 2.0 was released 2007 a year before IEEE 754:2008 embraced decimal-based.
  • XSLT 3 using by default decimal-based floating-point as XSLT 3.0 was released 2017 and referring to IEEE 754:2008 Or even change the default later in XSLT 4.

In any case, you should allow some configuration for changing the floating-point to decimal-based (or from the default just in case). It is an easy switch in NumericValue.java, see https://github.com/svanteschubert/Saxon-HE/commit/fe8ca45c54622b467eb58fbaeae0d3edbe4461c7 The complete e-commerce business should better be using decimal-based.

For Saxon, a default switch should be considered with a new major release to switch the default. This might be helpful as the new accuracy will change results and some automated regression tests might be caught by surprise. Such changes can be expected in a major release.

Again the former example now with better format:

quantity = 1000000000.0 
priceAmount = 1.0 
baseQuantity = 3

Using binary floating-point:

 $quantity * ($priceAmount div $baseQuantity)) = (1000000000.0 *(1.0 div 3 )) = 333333333.333333333333333333                                                                                                                                                          
($quantity *  $priceAmount div $baseQuantity)  = (1000000000.0 * 1.0 div 3 )  = 333333333.3333333333333333333333333333333333

The above values should be the same, but differ by 0.0000000000000003333333333333333 In the energy & pharma sector prices with 6 to 9 decimal places are often, going along with high-quantity errors are in easily in Cent level.

Using decimal-based floating-point (IEEE 754:2008 or later)

 $quantity * ($priceAmount div $baseQuantity)) = (1000000000.0 *(1.0 div 3 )) = 333333333.3333333333333333333333333 
($quantity *  $priceAmount div $baseQuantity)  = (1000000000.0 * 1.0 div 3 )  = 333333333.3333333333333333333333333 

Question/Suggestion: I added half-up to the "integrated extension functions" and it seems that half-up-even was not consistently implemented and fixed this according to the XSLT >=2 specification: https://www.w3.org/TR/xquery-operators/#func-round-half-to-even or https://www.w3.org/TR/xpath-functions-31/#func-round-half-to-even

The JavaDoc of the parent class "NumericValue" states:

    /**
     * Implement the XPath 2.0 round-half-to-even() function
     *
     * @param scale the decimal position for rounding: e.g. 2 rounds to a
     *              multiple of 0.01, while -2 rounds to a multiple of 100
     * @return a value, of the same type as the original, rounded towards the
     *         nearest multiple of 10**(-scale), with rounding towards the nearest
     *         even number if two values are equally near
     */

But some implementations change the JavaDoc and are not implementing the XPath function by not allowing positive scale:

    /**
     * Implement the XPath round-to-half-even() function
     *
     * @param scale number of digits required after the decimal point; the
     *              value -2 (for example) means round to a multiple of 100
     * @return if the scale is &gt;=0, return this value unchanged. Otherwise
     *         round it to a multiple of 10**-scale
     */

if the scale is >=0, return this value unchanged. Otherwise round it to a multiple of 10-scale** Not only is the positive parameter neglected, but a change object instance instead of a returning a rounded copy. Either way is fine, but it should be consistent.

What do you prefer?

Actions #11

Updated by Michael Kay about 4 years ago

The xs:double data type in XSD and XPath is based very firmly on 64-bit binary floating point.

Support for "xs:precisionDecimal" based on IEEE-754:2008 was proposed for XSD 1.1 but didn't make it into the final spec (it turned into something of a political battle between Oracle and IBM). When it was withdrawn, however, there was a concession that allowed implementors to add primitive data types beyond those in the standard. So at the XSD level Saxon could add precisionDecimal (decimal-based floating point) if we chose, as an extension.

However, one of the factors that led to its withdrawal from XSD was the amount of work that would be needed to support it in XPath, especially the complexity that two values can be numerically equal, but still different when scale is taken into account. Having observed those discussions from the sidelines, defining the semantics for precisionDecimal support in XPath (especially the semantics of mixed-type operations) I would not be at all enthusiastic about taking it on. I think this belongs in an add-on function library, not in the core, at least until it's becomes tried and tested.

As far as your comments on the Javadoc are concerned, the round-half-to-even() operation on an integer is a no-op if the scale is positive (round-half-to-even(23, 4) returns 23), and the Javadoc on the roundHalfToEven() methods on Int64Value and IntegerValue reflects this.

You seem to suggest that there's an implementation that's modifying existing values in situ rather than returning a copy; if that's the case then it would certainly be a bug, but I haven't found it from your description.

Actions #12

Updated by Svante Schubert about 4 years ago

Can we fix the specs?

Unfortunately, XSD is completely unaware (or ignorant) towards decimal-based floating-point. By prohibiting the scientific notation for xs:decimal and allowing it only for binary floating-point as xs:double, the usage of pure decimal-based arithmetic becomes very difficult, if not impossible for users. But science and especially e-commerce sector are desperately in need of accuracy, which decimal-base is offering. Users should be able to easily switch from binary to the decimal-based implementation detail.

Therefore instead of being ignorant to decimal-based and stating solely float and double, there likely should be differentiated types like:

  • bFlout
  • bDouble
  • bQuadrupel
  • dDouble
  • dQuadrupel

According to the table of parameters defining basic format floating-point numbers from IEEE 754 (attached).

Regarding XPath. I do not see the problem, yet. The Java implementation BigDecimal has an equal() function on the number the syntax might differ, but the semantic stays the same. Aren't the multiple representations of the same number an implementation detail that can be shielded away by some normalization layer? The user's need for accuracy outruns the problems we might have with implementations ;-)

Can users rely on decimal-based accuracy via Saxon?

So if we are strict to the old XSD spec being ignorant to decimal-based, we can not solve the problem. How can a decimal-based extension work? Any suggestions? I bet NumericValue.parseNumber has to decide between the floating-point base. If someone wants accuracy it makes little sense to mix binary and decimal but should stick to decimal. That is the reason for my prototype/fork/test to give the EU e-invoice validation artefacts a solid XSLT Saxon base (decimal)!

Comments/Questions on Saxon Code

  1. I agree with your comment on XPath https://www.w3.org/TR/xpath-functions-31/#func-round-half-to-even specifies that the same (or related) type has to be returned. FYI: In Java, the following can be done: Rounding half-even 123456789 with scale -2 to 123456800 Rounding half-even 123456789 with scale 2 to 123456789.00 Last line via: new BigDecimal("123456789").divide(BigDecimal.ONE, 2, RoundingMode.HALF_EVEN).toPlainString()
  2. If rounding should always return a copy, is this correct? https://github.com/svanteschubert/Saxon-HE/blob/main/src/main/java/net/sf/saxon/value/Int64Value.java#L495 It is safer to return only copies, are there concurrent accesses or what is the reason? Just curious.

What's next?

In the next days, I will review my weekend work to provide you with a more solid "pull-request/suggestion". BTW I give you all rights/agreements you like to rejoin my work with your codebase. If you like we might have a shortly joined tea-break to discuss any further obstacles in our way to support decimal-based :-) We shortly met in Prag this year at the XML conference, I asked you support bidirectional XSL transformation... ;-)

Actions #13

Updated by Michael Kay about 4 years ago

  • Tracker changed from Bug to Feature
  • Subject changed from Switching from 9.9.1-8 to 9.9.1-9 (also in 10.3) causes error: "Required attribute 'select' is missing" to Decimal Precision
  • Status changed from New to In Progress

Changing the title to better reflect the topic, and recategorising from "Bug" to "Feature" since there appears to be no suggestion that the product isn't behaving to spec.

Actions #14

Updated by Michael Kay almost 4 years ago

  • Category set to Saxon extensions
  • Status changed from In Progress to Closed
  • Assignee set to Michael Kay

I'm going to close this with no action, I'm afraid. I can't envisage circumstances in which we would be able to construct a business case for investing in this area. If someone produces a third-party library that provides such functionality, then we would consider integrating it.

Please register to edit this issue

Also available in: Atom PDF