Project

Profile

Help

Using sum() on 2 sequences and adding results returns incorrect value

Added by Dave Dezinski almost 4 years ago

We have a .NET application that uses Saxon-HE 9.9.1.7 and XPath 2.0, when using sum() on two sequences and adding the results we are seeing a different value returned than if we just sum the values directly.

Here's a sample .NET console app that will reproduce the issue:

using System;
using System.IO;
using Saxon.Api;

namespace SaxonSum
{
    static class Program
    {
        static void Main(string[] args)
        {
            string xml = @"<root>
	                         <table1>
		                       <value>1.02</value>
		                       <value>1.02</value>
	                         </table1>
	                         <table2>
		                       <value>1.05</value>
		                       <value>1.05</value>
	                         </table2>
                           </root>";

            using (StringReader sr = new StringReader(xml))
            {
                Processor processor = new Processor();
                var builder = processor.NewDocumentBuilder();
                builder.BaseUri = new Uri("file:///data.xml");
                var node = builder.Build((TextReader)sr);
                var compiler = processor.NewXPathCompiler();

                var result = compiler.Evaluate("sum(/root/table1/value) + sum(/root/table2/value)", node);
                Console.WriteLine(result); // returns 4.140000000000001

                var result2 = compiler.Evaluate("number(/root/table1/value[1]) + number(/root/table1/value[2]) + number(/root/table2/value[1]) + number(/root/table2/value[2])", node);
                Console.WriteLine(result2); // returns 4.14

                var result3 = compiler.Evaluate("sum(//value)", node);
                Console.WriteLine(result3); // returns 4.14
            }
        }
    }
}

Expected result is 4.14 but the result variable above returns 4.140000000000001, I'm assuming this is due to doubles being used internally but it doesn't explain why we're not seeing the same results in the other two cases.

No me this appears to be a bug, these same 3 XPath queries in XMLSpy all return 4.14.

Thoughts?


Replies (10)

Please register to reply

RE: Using sum() on 2 sequences and adding results returns incorrect value - Added by Michael Kay almost 4 years ago

I can reproduce this directly in Java, with no use of Saxon code:

        double a = new Double("1.02");
        double b = new Double("1.05");
        System.err.println("(P) " + (first + first + second + second));
        double x = a + a;
        double y = b + b;
        System.err.println("(Q) " + (x + y));

outputs

(P) 4.14
(Q) 4.140000000000001

There's no Saxon code here, so if it's a bug, then it's not a Saxon bug! I haven't got a Windows machine with me so I can't try it in C#, I suggest you do so and I would be interested in the results.

I'm not sure I understand IEEE floating point arithmetic well enough to explain this fully, but I can try a rough explanation.

Let's look at the internal form of these numbers:

==== Double 1.02 ====
Internal form: 3ff051eb851eb852
Sign: 1
Raw Exponent: 1023
Exponent: -52
Significand: 4593671619917906

==== Double 1.05 ====
Internal form: 3ff0cccccccccccd
Sign: 1
Raw Exponent: 1023
Exponent: -52
Significand: 4728779608739021

==== Double 2.04 ====
Internal form: 400051eb851eb852
Sign: 1
Raw Exponent: 1024
Exponent: -51
Significand: 4593671619917906

==== Double 2.1 ====
Internal form: 4000cccccccccccd
Sign: 1
Raw Exponent: 1024
Exponent: -51
Significand: 4728779608739021

So in each case, doubllng the number has produced a number with the same significand and an exponent one greater, as we would expect. But when we add two numbers with different exponents, then the signficand is going to be shifted and this creates the possibility of a loss of precision. As a result, the order of evaluation is signfiicant: (a+a)+(b+b) is not guaranteed to produce the same result as ((a+a)+b)+b).

I hope that gives a bit of an explanation; I know it's not exact, But with floating point arithmetic, there is always scope for rounding errors like this.

RE: Using sum() on 2 sequences and adding results returns incorrect value - Added by Martin Honnen almost 4 years ago

To avoid the pitfalls of the double impresicion, in XPath 2 and later as supported by Saxon you can always use the xs:decimal data type explicitly with e.g.

sum(//value/xs:decimal(.))
sum(/root/table1/value/xs:decimal(.)) + sum(/root/table2/value/xs:decimal(.))

RE: Using sum() on 2 sequences and adding results returns incorrect value - Added by Dave Dezinski almost 4 years ago

Thanks for both the detailed reply and the suggested fix. I've tried reproducing this in .NET with the following code and I end up getting the same result for both (P) and(Q):

double a = 1.02d;
double b = 1.05d;
Console.WriteLine("(P) " + (a + a + b + b));
double x = a + a;
double y = b + b;
Console.WriteLine("(Q) " + (x + y));

outputs

(P) 4.14
(Q) 4.14

Adding xs:decimal(.) does appear to fix this as long as the compiler BackwardsCompatible property is not set to true, I suspect that xs:decimal is not being honored when backwards compatibility is enabled.

RE: Using sum() on 2 sequences and adding results returns incorrect value - Added by Martin Honnen almost 4 years ago

.NET with its default number formatting is a bit smarter on the surface than Java but if you ask it to show more precision with e.g.

            Console.WriteLine("(P) {0:R}", (a + a + b + b));
            
            Console.WriteLine("(Q) {0:R}", (x + y));

you will see in .NET as well that the second value is not exactly 4.14, you can also see that with CompareTo

Console.WriteLine((a + a + b + b).CompareTo(x + y));

RE: Using sum() on 2 sequences and adding results returns incorrect value - Added by Dave Dezinski almost 4 years ago

Yeah I see that now. I didn't realize that .NET was silently dropping the rest of the value.

I get the following with your changes:

(P) 4.14
(Q) 4.1400000000000006

Can you confirm that the use of xs:decimal is ignored in the case of the BackwardsCompatibility flag being set to true?

When I put this sample together I didn't realize our application was enabling backward compatibility so I need to make a change in order for xs:decimal to work properly.

Thanks

RE: Using sum() on 2 sequences and adding results returns incorrect value - Added by Martin Honnen almost 4 years ago

I don't think that xs:decimal is ignored in XPath 1.0 backwards compatibility mode, however the + operator indeed https://www.w3.org/TR/xpath-30/#doc-xpath30-AdditiveExpr converts any xs:decimal operand to an xs:double in that mode.

So you would need to use sum((/root/table1/value/xs:decimal(.),/root/table2/value/xs:decimal(.))) or sum(//value/xs:decimal(.)) to work consistently and reliably with xs:decimal in that mode, if you throw in a + operator any numeric operands is converted into an xs:double, the only numeric data type XPath 1.0 knew.

That's my understanding, wait what Michael Kay has to say.

RE: Using sum() on 2 sequences and adding results returns incorrect value - Added by Michael Kay almost 4 years ago

A cast to xs:decimal isn't ignored in backwards compatibility mode, but there are certainly things in BC mode that affect arithmetic and I would need to see exactly what you are doing.

RE: Using sum() on 2 sequences and adding results returns incorrect value - Added by Dave Dezinski almost 4 years ago

Using the same code as I initially posted and adding the following lines at the end:

var result4 = compiler.Evaluate("sum(/root/table1/value/xs:decimal(.)) + sum(/root/table2/value/xs:decimal(.))", node); 
Console.WriteLine(result4);

results in:

4.140000000000001
4.14
4.14
4.14

If I add the following code after creating the compiler:

compiler.BackwardsCompatible = true;

I get the following results:

4.140000000000001
4.14
4.14
4.140000000000001

It turns out that the problem is actually in our application, our code is enabling backward compatibility when it should not have been. When I added the xs:decimal(.) I did not see a difference until I disabled backward compatibility.

RE: Using sum() on 2 sequences and adding results returns incorrect value - Added by Michael Kay almost 4 years ago

The spec says:

If XPath 1.0 compatibility mode is true, each operand is evaluated by applying the following steps, in order:

.... If the atomized operand is now an instance of type xs:boolean, xs:string, xs:decimal (including xs:integer), xs:float, or xs:untypedAtomic, then it is converted to the type xs:double by applying the fn:number function. (Note that fn:number returns the value NaN if its operand cannot be converted to a number.)**

So in BC mode addition is always double addition, even when adding two decimals. (Something I had certainly forgotten!).

But I don't think XPath 1.0 compatibility affects the result of the sum() function.

RE: Using sum() on 2 sequences and adding results returns incorrect value - Added by Dave Dezinski almost 4 years ago

That makes sense and explains the differences I'm seeing.

Thanks Michael and Martin!

    (1-10/10)

    Please register to reply