Using sum() on 2 sequences and adding results returns incorrect value
Added by Dave Dezinski over 4 years ago
We have a .NET application that uses Saxon-HE 9.9.1.7 and XPath 2.0, when using sum() on two sequences and adding the results we are seeing a different value returned than if we just sum the values directly.
Here's a sample .NET console app that will reproduce the issue:
using System;
using System.IO;
using Saxon.Api;
namespace SaxonSum
{
static class Program
{
static void Main(string[] args)
{
string xml = @"<root>
<table1>
<value>1.02</value>
<value>1.02</value>
</table1>
<table2>
<value>1.05</value>
<value>1.05</value>
</table2>
</root>";
using (StringReader sr = new StringReader(xml))
{
Processor processor = new Processor();
var builder = processor.NewDocumentBuilder();
builder.BaseUri = new Uri("file:///data.xml");
var node = builder.Build((TextReader)sr);
var compiler = processor.NewXPathCompiler();
var result = compiler.Evaluate("sum(/root/table1/value) + sum(/root/table2/value)", node);
Console.WriteLine(result); // returns 4.140000000000001
var result2 = compiler.Evaluate("number(/root/table1/value[1]) + number(/root/table1/value[2]) + number(/root/table2/value[1]) + number(/root/table2/value[2])", node);
Console.WriteLine(result2); // returns 4.14
var result3 = compiler.Evaluate("sum(//value)", node);
Console.WriteLine(result3); // returns 4.14
}
}
}
}
Expected result is 4.14 but the result variable above returns 4.140000000000001, I'm assuming this is due to doubles being used internally but it doesn't explain why we're not seeing the same results in the other two cases.
No me this appears to be a bug, these same 3 XPath queries in XMLSpy all return 4.14.
Thoughts?
Replies (10)
Please register to reply
RE: Using sum() on 2 sequences and adding results returns incorrect value - Added by Michael Kay over 4 years ago
I can reproduce this directly in Java, with no use of Saxon code:
double a = new Double("1.02");
double b = new Double("1.05");
System.err.println("(P) " + (first + first + second + second));
double x = a + a;
double y = b + b;
System.err.println("(Q) " + (x + y));
outputs
(P) 4.14
(Q) 4.140000000000001
There's no Saxon code here, so if it's a bug, then it's not a Saxon bug! I haven't got a Windows machine with me so I can't try it in C#, I suggest you do so and I would be interested in the results.
I'm not sure I understand IEEE floating point arithmetic well enough to explain this fully, but I can try a rough explanation.
Let's look at the internal form of these numbers:
==== Double 1.02 ====
Internal form: 3ff051eb851eb852
Sign: 1
Raw Exponent: 1023
Exponent: -52
Significand: 4593671619917906
==== Double 1.05 ====
Internal form: 3ff0cccccccccccd
Sign: 1
Raw Exponent: 1023
Exponent: -52
Significand: 4728779608739021
==== Double 2.04 ====
Internal form: 400051eb851eb852
Sign: 1
Raw Exponent: 1024
Exponent: -51
Significand: 4593671619917906
==== Double 2.1 ====
Internal form: 4000cccccccccccd
Sign: 1
Raw Exponent: 1024
Exponent: -51
Significand: 4728779608739021
So in each case, doubllng the number has produced a number with the same significand and an exponent one greater, as we would expect. But when we add two numbers with different exponents, then the signficand is going to be shifted and this creates the possibility of a loss of precision. As a result, the order of evaluation is signfiicant: (a+a)+(b+b) is not guaranteed to produce the same result as ((a+a)+b)+b).
I hope that gives a bit of an explanation; I know it's not exact, But with floating point arithmetic, there is always scope for rounding errors like this.
RE: Using sum() on 2 sequences and adding results returns incorrect value - Added by Martin Honnen over 4 years ago
To avoid the pitfalls of the double impresicion, in XPath 2 and later as supported by Saxon you can always use the xs:decimal
data type explicitly with e.g.
sum(//value/xs:decimal(.))
sum(/root/table1/value/xs:decimal(.)) + sum(/root/table2/value/xs:decimal(.))
RE: Using sum() on 2 sequences and adding results returns incorrect value - Added by Dave Dezinski over 4 years ago
Thanks for both the detailed reply and the suggested fix. I've tried reproducing this in .NET with the following code and I end up getting the same result for both (P) and(Q):
double a = 1.02d;
double b = 1.05d;
Console.WriteLine("(P) " + (a + a + b + b));
double x = a + a;
double y = b + b;
Console.WriteLine("(Q) " + (x + y));
outputs
(P) 4.14
(Q) 4.14
Adding xs:decimal(.) does appear to fix this as long as the compiler BackwardsCompatible property is not set to true, I suspect that xs:decimal is not being honored when backwards compatibility is enabled.
RE: Using sum() on 2 sequences and adding results returns incorrect value - Added by Martin Honnen over 4 years ago
.NET with its default number formatting is a bit smarter on the surface than Java but if you ask it to show more precision with e.g.
Console.WriteLine("(P) {0:R}", (a + a + b + b));
Console.WriteLine("(Q) {0:R}", (x + y));
you will see in .NET as well that the second value is not exactly 4.14, you can also see that with CompareTo
Console.WriteLine((a + a + b + b).CompareTo(x + y));
RE: Using sum() on 2 sequences and adding results returns incorrect value - Added by Dave Dezinski over 4 years ago
Yeah I see that now. I didn't realize that .NET was silently dropping the rest of the value.
I get the following with your changes:
(P) 4.14
(Q) 4.1400000000000006
Can you confirm that the use of xs:decimal is ignored in the case of the BackwardsCompatibility flag being set to true?
When I put this sample together I didn't realize our application was enabling backward compatibility so I need to make a change in order for xs:decimal to work properly.
Thanks
RE: Using sum() on 2 sequences and adding results returns incorrect value - Added by Martin Honnen over 4 years ago
I don't think that xs:decimal
is ignored in XPath 1.0 backwards compatibility mode, however the +
operator indeed https://www.w3.org/TR/xpath-30/#doc-xpath30-AdditiveExpr converts any xs:decimal
operand to an xs:double
in that mode.
So you would need to use sum((/root/table1/value/xs:decimal(.),/root/table2/value/xs:decimal(.)))
or sum(//value/xs:decimal(.))
to work consistently and reliably with xs:decimal
in that mode, if you throw in a +
operator any numeric operands is converted into an xs:double
, the only numeric data type XPath 1.0 knew.
That's my understanding, wait what Michael Kay has to say.
RE: Using sum() on 2 sequences and adding results returns incorrect value - Added by Michael Kay over 4 years ago
A cast to xs:decimal
isn't ignored in backwards compatibility mode, but there are certainly things in BC mode that affect arithmetic and I would need to see exactly what you are doing.
RE: Using sum() on 2 sequences and adding results returns incorrect value - Added by Dave Dezinski over 4 years ago
Using the same code as I initially posted and adding the following lines at the end:
var result4 = compiler.Evaluate("sum(/root/table1/value/xs:decimal(.)) + sum(/root/table2/value/xs:decimal(.))", node);
Console.WriteLine(result4);
results in:
4.140000000000001
4.14
4.14
4.14
If I add the following code after creating the compiler:
compiler.BackwardsCompatible = true;
I get the following results:
4.140000000000001
4.14
4.14
4.140000000000001
It turns out that the problem is actually in our application, our code is enabling backward compatibility when it should not have been. When I added the xs:decimal(.) I did not see a difference until I disabled backward compatibility.
RE: Using sum() on 2 sequences and adding results returns incorrect value - Added by Michael Kay over 4 years ago
The spec says:
If XPath 1.0 compatibility mode is true, each operand is evaluated by applying the following steps, in order:
.... If the atomized operand is now an instance of type xs:boolean, xs:string, xs:decimal (including xs:integer), xs:float, or xs:untypedAtomic, then it is converted to the type xs:double by applying the fn:number function. (Note that fn:number returns the value NaN if its operand cannot be converted to a number.)**
So in BC mode addition is always double addition, even when adding two decimals. (Something I had certainly forgotten!).
But I don't think XPath 1.0 compatibility affects the result of the sum() function.
RE: Using sum() on 2 sequences and adding results returns incorrect value - Added by Dave Dezinski over 4 years ago
That makes sense and explains the differences I'm seeing.
Thanks Michael and Martin!
Please register to reply