Bug #5565
closedSaxon-HE 10N: Extra namespaces on every node when serializing into XmlWriter
100%
Description
I just stumbled over the issues #4353 and #5480, and we see a similar issue with 10.8; but neither seemed to really match our exact problem - so I decided to create a new ticket (just in case). This is still our ongoing efforts to move from Saxon-HE 9.9.x to Saxon-HE 10 on .NET Framework 4.8 (which we're still stuck with for now, so we can't really update to SaxonCS 11 or higher yet as we're still figuring out how to migrate certain parts over to be .NET 6 compatible).
Using this source:
private static void Main(string[] args)
{
var xslt = XDocument.Parse(@"
<stylesheet xmlns='http://www.w3.org/1999/XSL/Transform' version='3.0'>
<mode on-no-match='shallow-copy'/>
<template match='foo'>
<bar xmlns=''><apply-templates/></bar>
</template>
</stylesheet>", LoadOptions.SetLineInfo);
var input = xslt; // does not matter for this test.
var output = new XDocument(); // our target is an in-memory document
var processor = new Processor();
using (var xsltReader = xslt.CreateReader())
using (var inputReader = input.CreateReader())
using (var outputWriter = output.CreateWriter())
{
var compiler = processor.NewXsltCompiler();
compiler.BaseUri = new Uri(typeof(Program).Assembly.Location);
var xsltExecutable = compiler.Compile(xsltReader);
var transformer = xsltExecutable.Load();
var builder = processor.NewDocumentBuilder();
transformer.InitialContextNode = builder.Build(inputReader);
var destination = new TextWriterDestination(outputWriter) { CloseAfterUse = true };
transformer.Run(destination);
// this inserts xmlns... on every serialized element
Console.WriteLine(output.ToString());
// this uses System.Xml.Linq to omit namespaces that aren't needed
//Console.WriteLine(output.ToString(SaveOptions.OmitDuplicateNamespaces));
}
}
...we get this output with the latest Saxon-HE 9.9.x:
<stylesheet xmlns="http://www.w3.org/1999/XSL/Transform" version="3.0">
<mode on-no-match="shallow-copy" />
<template match="foo">
<bar xmlns="">
<apply-templates />
</bar>
</template>
</stylesheet>
With 10.8, the exact same code gives us this:
<stylesheet xmlns="http://www.w3.org/1999/XSL/Transform" version="3.0">
<mode xmlns="http://www.w3.org/1999/XSL/Transform" on-no-match="shallow-copy" />
<template xmlns="http://www.w3.org/1999/XSL/Transform" match="foo">
<bar xmlns="">
<apply-templates />
</bar>
</template>
</stylesheet>
For obvious reasons, the larger the input and resulting document are, the more problematic this becomes (where we see a massive increase in document size if we just take the result as-is and, lets say, write it to disk or into a database field. And, obviously, subsequent performance/memory issues as we see 20x to 50x increase in size at times just because of the unnecessary namespace declarations).
We can get the previous result back if we ask XDocument.Save
to OmitDuplicateNamespaces
, but that is a lot of work since the transformation part often has no influence on where it serializes its result to (in fact, we only accept XmlDestination
in our common code that does all the Saxon-related work; so we could only attempt to wrap any TextWriterDestination
to address this sensibly).
However: In some places we directly go to file
var destination = processor.NewSerializer();
destination.SetOutputStream(fileStream);
transformer.Run(destination);
...where the namespaces are fine. So it is likely something inside TextWriterDestination
, or maybe one of the wrappers that translates between .NET and Java classes.
As far as I can tell, this is one of the last major issues we face in moving from 9 to 10, since this involves a lot of otherwise unrelated code changes (which before "just worked"), and we're not even sure if this fully solves the issue (as we sometimes simply take the produced XML fragment and carry on using the XDocument
, XElement
or whatever result we need; with the actual serialization inside .NET nowhere in sight).
I noticed you don't seem to plan on further 10.x releases (at least according to other tickets; which I can understand at some point, especially with 11CS moving forward), so we'd also be ok with wrapping some XmlDestination
(if we can get away with doing it in a single place). Any sort of way forward is appreciated, since we slowly but surely reached a point where we cannot use certain features that arrived in HE with 10, but aren't there yet in 9.
Related issues
Updated by Emanuel Wlaschitz over 2 years ago
I suspect the cause is whats documented in net.sf.saxon.event.Outputter
around line 196ff:
https://saxonica.plan.io/projects/saxonmirrorhe/repository/he/revisions/saxon10/entry/src/main/java/net/sf/saxon/event/Outputter.java#L196
Particularly
This reflects the fact that when copying a tree, namespaces for child elements are emitted before the namespaces of their parent element.
With
@since changed in 10.0 to report all the in-scope namespaces for an element, and to do so in a single call.
Updated by Michael Kay over 2 years ago
- Has duplicate Bug #5480: Saxon (10 on .NET) Adding Extraneous namespaces to each ancestor node, i.e. xmlns="" added
Updated by Michael Kay about 2 years ago
- Category set to .NET API
- Status changed from New to In Progress
- Assignee set to Michael Kay
- Priority changed from Low to Normal
TextWriterDestination was renamed XmlWriterDestination in 11.x, but it's essentially the same code. I'm pretty sure we fixed this problem in 11.x by adding a NamespaceDifferencer
into the pipeline at XmlWriterDestination.GetReceiver
. I shall try to retro-fit that change.
Updated by O'Neil Delpratt almost 2 years ago
- Assignee changed from Michael Kay to O'Neil Delpratt
Updated by O'Neil Delpratt almost 2 years ago
- Status changed from In Progress to Resolved
- Fix Committed on Branch 10 added
The fix has been applied to the Saxon 10 branch available for the next maintenance release.
Updated by O'Neil Delpratt almost 2 years ago
- Status changed from Resolved to Closed
- % Done changed from 0 to 100
- Fixed in Maintenance Release 10.9 added
Bug fix applied in the Saxon 10.9 maintenance release.
Please register to edit this issue