Project

Profile

Help

Support #6133

closed

Change in attributes serialization in Saxon 12.3

Added by Radu Coravu 9 months ago. Updated 8 months ago.

Status:
Closed
Priority:
Low
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
2023-07-19
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

I just noticed running some tests we have that there was a change in how a Saxon transformer serializes attribute names which are split on multiple lines. There is one less space of indentation. Saxon used to do this:

<sometag aX="b1"
                  aY="b2"

and now it does this:

<sometag aX="b1"
                 aY="b2"

Is this an intended behavior?


Files

Actions #1

Updated by Martin Honnen 9 months ago

Actions #2

Updated by Michael Kay 9 months ago

This was a bug fix https://saxonica.plan.io/issues/6063

Did we get it wrong? if so, please send us a repro.

Actions #3

Updated by Radu Coravu 9 months ago

So with this stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="2.0">
    <xsl:output indent="yes"/>
    <xsl:template match="/">
        <root a="b" c="d" h="e" f="g" a1="b" c1="d" h1="e" f1="g"/>
    </xsl:template>
</xsl:stylesheet>

I obtain this indentation, 5 spaces used for indenting the attributes on each line instead of 6.

Actions #4

Updated by Martin Honnen 9 months ago

Does that also happen for your for Saxon 12.3 from the command line?

I get e.g.

<?xml version="1.0" encoding="UTF-8"?>
<root a="b"
      c="d"
      h="e"
      f="g"
      a1="b"
      c1="d"
      h1="e"
      f1="g"/>

Actions #5

Updated by Radu Coravu 9 months ago

Thanks Martin. Then we probably have a bad patch somewhere, you can probably close this one and we'll try to investigate further on our side.

Actions #6

Updated by Norm Tovey-Walsh 9 months ago

I obtain this indentation, 5 spaces used for indenting the attributes
on each line instead of 6.

Saxon 11.x produces:

I’ve long been annoyed by the extra spaces, but never enough to report
it or try to fix it. With Mike’s recent fix, I see:

Is that the same behavior you’re seeing?

I’m inclined to think of it as improvement. Though I suppose it’s a
backwards incompatible change if you’re feeling especially pedantic.

Be seeing you,
norm

--
Norm Tovey-Walsh
Saxonica

Actions #8

Updated by Michael Kay 9 months ago

It might be worth pointing out that the user originally reporting bug #6063 said the behaviour varied according to the operating system, and we never got to the bottom of that suggestion.

Actions #9

Updated by Radu Coravu 9 months ago

I also cannot reproduce the problem using the original Saxon 12.3 HE JAR libraries and a bit of Java code so I'm inclining to blame one of our patches for this.

Actions #10

Updated by Norm Tovey-Walsh 9 months ago

Actions #12

Updated by Radu Coravu 9 months ago

There are things I don't understand about the change made to net.sf.saxon.serialize.XMLIndenter Something like this:

 int indent = (level - 1) * getIndentation() + 3;

was changed to:

 int indent = (level - 1) * getIndentation() + 2;

I do not understand why not use directly the current attribute's level indentation

``
 int indent = level * getIndentation();

also I do not understand why if the default indentation is "3" for all elements I get an indentation of "4" on my side but maybe this is caused by one of our patches..

Actions #13

Updated by Radu Coravu 9 months ago

I think I understand, in XMLEmitter the tagname length is added to the indentation

            if (indentForNextAttribute >= 0) {
                indentForNextAttribute += displayName.length();
            }

so that "+2" means the start tag char and the space after the tag name so that the wrapped attribute is aligned with the attribute from the previous line.

int indent = (level - 1) * getIndentation() + 2;
Actions #14

Updated by Radu Coravu 9 months ago

And I confirm the problem is in one of our patches :( Sorry for wasting your time.

Actions #15

Updated by Michael Kay 9 months ago

It was changed from +3 to +2 because it was wrong: the value no longer includes the newline character, only the number of spaces (see XMLEmitter.writeAttributeIndentString()).

The original design made XMLIndenter responsible for all aspects of indentation, by inserting text nodes into the pipeline, so XMLEmitter didn't have to know anything about indentation. This broke down when we started indenting attributes, so there's now a rather complicated interaction between the two classes.

If we assume line==0 (that is, the start tag isn't preceded by multi-line text), the number of spaces before a startElement start tag will be level*getIndentation(). We increment level before emitting the attributes, so for attributes we use (level-1) -- it would make more sense to defer incrementing level, but this is delicate code... The +2 is +1 for the "<" before the element name, and +1 for the space after the element name; XMLEmitter itself adds the length of the element name at line 471.

Sorry it's so convoluted, I'm sure it could be done much more elegantly.

Actions #16

Updated by Radu Coravu 9 months ago

Hi Michael, thanks, this is all clear to me now. Indeed the classes are a bit tightly coupled. Maybe from the point of view of someone reading the code that +2 could have been added in XMLEmitter along with the tag name. Or the other way around, in XMLIndenter take the element name into account:

                int indent = (level - 1) * getIndentation() + 2 + nameCode.getDisplayName().length();
                emitter.setIndentForNextAttribute(indent);
Actions #17

Updated by Michael Kay 8 months ago

  • Status changed from New to Closed

Please register to edit this issue

Also available in: Atom PDF