Support #6133
closedChange in attributes serialization in Saxon 12.3
Added by Radu Coravu over 1 year ago. Updated over 1 year ago.
0%
Description
I just noticed running some tests we have that there was a change in how a Saxon transformer serializes attribute names which are split on multiple lines. There is one less space of indentation. Saxon used to do this:
<sometag aX="b1"
aY="b2"
and now it does this:
<sometag aX="b1"
aY="b2"
Is this an intended behavior?
Files
Screenshot 2023-07-19 at 16.05.44.png (39.3 KB) Screenshot 2023-07-19 at 16.05.44.png | Radu Coravu, 2023-07-19 15:06 | ||
Screenshot 2023-07-19 152003.png (48.1 KB) Screenshot 2023-07-19 152003.png | Martin Honnen, 2023-07-19 15:22 |
Updated by Martin Honnen over 1 year ago
12.3 has a fix for https://saxonica.plan.io/issues/6063
Updated by Michael Kay over 1 year ago
This was a bug fix https://saxonica.plan.io/issues/6063
Did we get it wrong? if so, please send us a repro.
Updated by Radu Coravu over 1 year ago
So with this stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="2.0">
<xsl:output indent="yes"/>
<xsl:template match="/">
<root a="b" c="d" h="e" f="g" a1="b" c1="d" h1="e" f1="g"/>
</xsl:template>
</xsl:stylesheet>
I obtain this indentation, 5 spaces used for indenting the attributes on each line instead of 6.
Updated by Martin Honnen over 1 year ago
Does that also happen for your for Saxon 12.3 from the command line?
I get e.g.
<?xml version="1.0" encoding="UTF-8"?>
<root a="b"
c="d"
h="e"
f="g"
a1="b"
c1="d"
h1="e"
f1="g"/>
Updated by Radu Coravu over 1 year ago
Thanks Martin. Then we probably have a bad patch somewhere, you can probably close this one and we'll try to investigate further on our side.
Updated by Norm Tovey-Walsh over 1 year ago
I obtain this indentation, 5 spaces used for indenting the attributes
on each line instead of 6.
Saxon 11.x produces:
I’ve long been annoyed by the extra spaces, but never enough to report
it or try to fix it. With Mike’s recent fix, I see:
Is that the same behavior you’re seeing?
I’m inclined to think of it as improvement. Though I suppose it’s a
backwards incompatible change if you’re feeling especially pedantic.
Be seeing you,
norm
--
Norm Tovey-Walsh
Saxonica
Updated by Michael Kay over 1 year ago
It might be worth pointing out that the user originally reporting bug #6063 said the behaviour varied according to the operating system, and we never got to the bottom of that suggestion.
Updated by Radu Coravu over 1 year ago
I also cannot reproduce the problem using the original Saxon 12.3 HE JAR libraries and a bit of Java code so I'm inclining to blame one of our patches for this.
Updated by Radu Coravu over 1 year ago
There are things I don't understand about the change made to net.sf.saxon.serialize.XMLIndenter Something like this:
int indent = (level - 1) * getIndentation() + 3;
was changed to:
int indent = (level - 1) * getIndentation() + 2;
I do not understand why not use directly the current attribute's level indentation
``
int indent = level * getIndentation();
also I do not understand why if the default indentation is "3" for all elements I get an indentation of "4" on my side but maybe this is caused by one of our patches..
Updated by Radu Coravu over 1 year ago
I think I understand, in XMLEmitter the tagname length is added to the indentation
if (indentForNextAttribute >= 0) {
indentForNextAttribute += displayName.length();
}
so that "+2" means the start tag char and the space after the tag name so that the wrapped attribute is aligned with the attribute from the previous line.
int indent = (level - 1) * getIndentation() + 2;
Updated by Radu Coravu over 1 year ago
And I confirm the problem is in one of our patches :( Sorry for wasting your time.
Updated by Michael Kay over 1 year ago
It was changed from +3 to +2 because it was wrong: the value no longer includes the newline character, only the number of spaces (see XMLEmitter.writeAttributeIndentString()
).
The original design made XMLIndenter
responsible for all aspects of indentation, by inserting text nodes into the pipeline, so XMLEmitter
didn't have to know anything about indentation. This broke down when we started indenting attributes, so there's now a rather complicated interaction between the two classes.
If we assume line==0
(that is, the start tag isn't preceded by multi-line text), the number of spaces before a startElement start tag will be level*getIndentation()
. We increment level
before emitting the attributes, so for attributes we use (level-1)
-- it would make more sense to defer incrementing level
, but this is delicate code... The +2 is +1 for the "<"
before the element name, and +1 for the space after the element name; XMLEmitter
itself adds the length of the element name at line 471.
Sorry it's so convoluted, I'm sure it could be done much more elegantly.
Updated by Radu Coravu over 1 year ago
Hi Michael, thanks, this is all clear to me now. Indeed the classes are a bit tightly coupled. Maybe from the point of view of someone reading the code that +2 could have been added in XMLEmitter along with the tag name. Or the other way around, in XMLIndenter take the element name into account:
int indent = (level - 1) * getIndentation() + 2 + nameCode.getDisplayName().length();
emitter.setIndentForNextAttribute(indent);
Please register to edit this issue