saxon-js xsl:output method "xml" with indent "yes" eats whitespace
Using saxon-js 2.2.0, and given XSLT:
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="2.0" exclude-result-prefixes="xs" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xsl:output method="xml" indent="yes"/> <xsl:template match="/"> <login> <xsl:value-of select="/data/text()"/> </login> </xsl:template> </xsl:stylesheet>
<?xml version="1.0" encoding="UTF-8"?> <data> </data>
<?xml version="1.0" encoding="UTF-8"?> <login/>
indent="no", then it outputs:
<?xml version="1.0" encoding="UTF-8"?> <login> </login>
It is unexpected that changing indentation would cause whitespace to be consumed and for elements to collapse.
#1 Updated by Michael Kay 24 days ago
The spec is here:
the serializer MAY output whitespace characters in addition to the whitespace characters in the instance of the data model. It MAY also elide from the output whitespace characters that occurred in the instance of the data model or replace such whitespace characters with other whitespace characters.
I've re-read the rules and the behaviour here is entirely consistent with the rules in the spec. Whether it's ideal is another matter. But if the whitespace is significant and you don't want it messed with, try using
The use of the term "elide" in the spec is a little quirky; it is used without formal definition. Dictionary definitions include to omit, delete, abridge, or ignore: I read it simply as "delete".
#2 Updated by Jamie Peabody 24 days ago
lol, y, that part of the spec is riddled with elide. Anyway, I think the issue is that there is no DTD, so no way of knowing any types. I think this is relevant:
Whitespace characters SHOULD NOT be added, elided or replaced in places where the characters would constitute significant whitespace, for example, in the immediate content of an element that is annotated with a type other than xs:untyped or xs:anyType, and whose content model is known to be mixed.
I think the content is significant whitespace (the element is not annotated). Elsewhere, Oracle says https://www.oracle.com/technical-resources/articles/wang-whitespace.html#:~:text=What%20is%20XML%20Whitespace%3F,content%20and%20should%20be%20preserved.
Usually without DTD or XML schema definition, all whitespaces are significant whitespaces and should be preserved.
Also, we use saxon (Java) elsewhere in my company, and I believe the behavior is different in this instance. I have not confirmed, but it was reported by a customer that it is different. Thus this issue.
#3 Updated by Michael Kay 24 days ago
Clearly when the serialization spec talks of "significant whitespace" it does NOT regard whitespace text nodes in untyped documents as significant, otherwise indentation would not be allowed to do anything at all in the common case of untyped documents.
But although I think this output is conformant, we'll look at whether the algorithm can be tweaked.
Please register to edit this issue