Project

Profile

Help

Bug #6494

closed

Whitespace text output disappears when indenting

Added by Nathan Claeys 6 months ago. Updated 6 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Serialization
Sprint/Milestone:
-
Start date:
2024-07-31
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
10, 11, 12, trunk
Fix Committed on Branch:
11, 12, trunk
Fixed in Maintenance Release:
Platforms:
.NET, Java

Description

The concat(string,string,string*) function allows the xslt transformation to concatenate as many strings as needed into one string.

However, starting from Saxon HE 10.0, this concat function has been broken for stream results apparently. It no longer properly concatenates strings that are only spaces. It seems the concatenation ignores sequences of space-only strings in the concatenate and only uses the last space string in such a sequence of strings.

Example: concat('A', ' ', ' ', 'B') results in 'A B' and not the expected 'A B'

The test project contains several concatenate calls with a unit test for a DOM result and for a stream result.

The concat(string,string) functionality does work, and using the concat(string,string,string*) with a DOM result also works as expected.


Files

whitespace-test.zip (7.88 KB) whitespace-test.zip test project Nathan Claeys, 2024-07-31 16:49
Actions #1

Updated by Nathan Claeys 6 months ago

Nathan Claeys wrote:

Example: concat('A', ' ', ' ', 'B') results in 'A B' and not the expected 'A B'

Ok so I should have put this into a code block, the tracker platform does its own formatting on spaces

Example:
concat('A', '         ', ' ', 'B') results in 'A B' and not the expected 'A          B'
Actions #2

Updated by Michael Kay 6 months ago

Thanks for reporting it.

I seem to be getting the incorrect results when I run from the command line in a terminal, but correct results when I run from my development environment under exactly the same conditions. It does seem to be highly dependent on the exact execution scenario -- and difficult to debug if I can't reproduce it in my normal debugging environment.

The arguments to concat() are all string literals so the actual concatenation is done at compile time, which makes it particularly hard to see how it can depend on anything in the run-time environment.

A puzzle.

Actions #3

Updated by Michael Kay 6 months ago

I got it to fail in the IDE by running without a Saxon-EE license.

Which (a) explains the difference, and (b) gives me an easy route forward for debugging.

Actions #4

Updated by Michael Kay 6 months ago

OK, got it. The problem is in the XML indenter - which explains why there's no problem when writing to a DOM.

With optimization suppressed, we are not pre-evaluating the concat() calls, rather we are evaluating concat() in push mode, effectively writing each of the arguments to concat() as a separate text node, rather than building the concatenated string and sending it to the serializer as a unit.

The indenter is buffering whitespace text nodes, on the grounds that in some circumstances (but not here) it can drop whitespace text nodes in order to improve the indented output.

But it only buffers one whitespace text node, and if it receives a second, the first is simply discarded.

So the effect is that if you write two consecutive whitespace text nodes to an indenting serializer (using concat or otherwise), the first one is discarded.

Remarkable how that hasn't been noticed before.

Actions #5

Updated by Michael Kay 6 months ago

  • Subject changed from Whitespaces dissapear in concat(string,string,string*) function calls to Whitespace text output disappears when indenting
  • Category changed from XSLT conformance to Serialization
  • Status changed from New to Resolved
  • Assignee set to Michael Kay
  • Applies to branch trunk added
  • Fix Committed on Branch 11, 12, trunk added
  • Platforms .NET added

Please register to edit this issue

Also available in: Atom PDF