Project

Profile

Help

Bug #5565

closed

Saxon-HE 10N: Extra namespaces on every node when serializing into XmlWriter

Added by Emanuel Wlaschitz almost 2 years ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Category:
.NET API
Sprint/Milestone:
-
Start date:
2022-06-14
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
10
Fix Committed on Branch:
10
Fixed in Maintenance Release:
Platforms:
.NET

Description

I just stumbled over the issues #4353 and #5480, and we see a similar issue with 10.8; but neither seemed to really match our exact problem - so I decided to create a new ticket (just in case). This is still our ongoing efforts to move from Saxon-HE 9.9.x to Saxon-HE 10 on .NET Framework 4.8 (which we're still stuck with for now, so we can't really update to SaxonCS 11 or higher yet as we're still figuring out how to migrate certain parts over to be .NET 6 compatible).

Using this source:

private static void Main(string[] args)
{
    var xslt = XDocument.Parse(@"
<stylesheet xmlns='http://www.w3.org/1999/XSL/Transform' version='3.0'>
    <mode on-no-match='shallow-copy'/>
    <template match='foo'>
        <bar xmlns=''><apply-templates/></bar>
    </template>
</stylesheet>", LoadOptions.SetLineInfo);
    var input = xslt; // does not matter for this test.
    var output = new XDocument(); // our target is an in-memory document

    var processor = new Processor();
    using (var xsltReader = xslt.CreateReader())
    using (var inputReader = input.CreateReader())
    using (var outputWriter = output.CreateWriter())
    {
        var compiler = processor.NewXsltCompiler();
        compiler.BaseUri = new Uri(typeof(Program).Assembly.Location);
        var xsltExecutable = compiler.Compile(xsltReader);

        var transformer = xsltExecutable.Load();

        var builder = processor.NewDocumentBuilder();
        transformer.InitialContextNode = builder.Build(inputReader);

        var destination = new TextWriterDestination(outputWriter) { CloseAfterUse = true };
        transformer.Run(destination);

        // this inserts xmlns... on every serialized element
        Console.WriteLine(output.ToString());
        // this uses System.Xml.Linq to omit namespaces that aren't needed
        //Console.WriteLine(output.ToString(SaveOptions.OmitDuplicateNamespaces));
    }
}

...we get this output with the latest Saxon-HE 9.9.x:

<stylesheet xmlns="http://www.w3.org/1999/XSL/Transform" version="3.0">
  <mode on-no-match="shallow-copy" />
  <template match="foo">
    <bar xmlns="">
      <apply-templates />
    </bar>
  </template>
</stylesheet>

With 10.8, the exact same code gives us this:

<stylesheet xmlns="http://www.w3.org/1999/XSL/Transform" version="3.0">
  <mode xmlns="http://www.w3.org/1999/XSL/Transform" on-no-match="shallow-copy" />
  <template xmlns="http://www.w3.org/1999/XSL/Transform" match="foo">
    <bar xmlns="">
      <apply-templates />
    </bar>
  </template>
</stylesheet>

For obvious reasons, the larger the input and resulting document are, the more problematic this becomes (where we see a massive increase in document size if we just take the result as-is and, lets say, write it to disk or into a database field. And, obviously, subsequent performance/memory issues as we see 20x to 50x increase in size at times just because of the unnecessary namespace declarations).

We can get the previous result back if we ask XDocument.Save to OmitDuplicateNamespaces, but that is a lot of work since the transformation part often has no influence on where it serializes its result to (in fact, we only accept XmlDestination in our common code that does all the Saxon-related work; so we could only attempt to wrap any TextWriterDestination to address this sensibly).

However: In some places we directly go to file

var destination = processor.NewSerializer();
destination.SetOutputStream(fileStream);
transformer.Run(destination);

...where the namespaces are fine. So it is likely something inside TextWriterDestination, or maybe one of the wrappers that translates between .NET and Java classes.

As far as I can tell, this is one of the last major issues we face in moving from 9 to 10, since this involves a lot of otherwise unrelated code changes (which before "just worked"), and we're not even sure if this fully solves the issue (as we sometimes simply take the produced XML fragment and carry on using the XDocument, XElement or whatever result we need; with the actual serialization inside .NET nowhere in sight).

I noticed you don't seem to plan on further 10.x releases (at least according to other tickets; which I can understand at some point, especially with 11CS moving forward), so we'd also be ok with wrapping some XmlDestination (if we can get away with doing it in a single place). Any sort of way forward is appreciated, since we slowly but surely reached a point where we cannot use certain features that arrived in HE with 10, but aren't there yet in 9.


Related issues

Has duplicate Saxon - Bug #5480: Saxon (10 on .NET) Adding Extraneous namespaces to each ancestor node, i.e. xmlns=""Duplicate2022-05-10

Actions
Actions #1

Updated by Emanuel Wlaschitz almost 2 years ago

I suspect the cause is whats documented in net.sf.saxon.event.Outputter around line 196ff: https://saxonica.plan.io/projects/saxonmirrorhe/repository/he/revisions/saxon10/entry/src/main/java/net/sf/saxon/event/Outputter.java#L196

Particularly

This reflects the fact that when copying a tree, namespaces for child elements are emitted before the namespaces of their parent element.

With

@since changed in 10.0 to report all the in-scope namespaces for an element, and to do so in a single call.

Actions #2

Updated by Michael Kay over 1 year ago

  • Has duplicate Bug #5480: Saxon (10 on .NET) Adding Extraneous namespaces to each ancestor node, i.e. xmlns="" added
Actions #3

Updated by Michael Kay over 1 year ago

  • Category set to .NET API
  • Status changed from New to In Progress
  • Assignee set to Michael Kay
  • Priority changed from Low to Normal

TextWriterDestination was renamed XmlWriterDestination in 11.x, but it's essentially the same code. I'm pretty sure we fixed this problem in 11.x by adding a NamespaceDifferencer into the pipeline at XmlWriterDestination.GetReceiver. I shall try to retro-fit that change.

Actions #4

Updated by O'Neil Delpratt about 1 year ago

  • Assignee changed from Michael Kay to O'Neil Delpratt
Actions #5

Updated by O'Neil Delpratt about 1 year ago

  • Status changed from In Progress to Resolved
  • Fix Committed on Branch 10 added

The fix has been applied to the Saxon 10 branch available for the next maintenance release.

Actions #6

Updated by O'Neil Delpratt about 1 year ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 10.9 added

Bug fix applied in the Saxon 10.9 maintenance release.

Please register to edit this issue

Also available in: Atom PDF