Saxon very slow when transforming large xml file
Added by Michael Staal-Olsen over 5 years ago
I am currently trying to transform the attached file (queryinput.xml) with respect to transform.xsl. transform.xsl depends on PlandataFinalResponse.xslt which in turn depends on ResponseTemplates.xslt.
It seems to me that the transformation takes somewhere between 25 and 35 minutes to complete on my machine with Saxon HE, where the .NET XslCompiledTransform can complete the transformation in seconds (less then 30 seconds). First and foremost: Can it really be true that Saxon is this slow for such a document? By debugging, it seems to me that Saxon spends a lot of time invoking the method TinyTree.bulkCopy.
Do you have any suggestions on how to increase performance?
PlandataFinalResponse.xslt (55.6 KB) PlandataFinalResponse.xslt | |||
ResponseTemplates.xslt (120 KB) ResponseTemplates.xslt | |||
transform.xsl (1.52 KB) transform.xsl | |||
queryinput.xml (2.88 MB) queryinput.xml |
Replies (11)
Please register to reply
RE: Saxon very slow when transforming large xml file - Added by Michael Staal-Olsen over 5 years ago
By the way, I am referring to Saxon HE on the Java platform, latest version of 9.9.
RE: Saxon very slow when transforming large xml file - Added by Michael Kay over 5 years ago
Thanks for reporting it.
I can confirm it is running very slowly under Saxon-HE 9.9.1.4 (I haven't run it to completion), whereas with Saxon-EE 9.9.1.4 it runs in about 3 seconds. I don't yet know the reason. There are of course optimizations in Saxon-EE that have a huge effect on some workloads, but it's unusual to see a difference as large as this. I will attempt to analyse why we're seeing the figures we are, and when we understand the cause, we'll take a position on whether the product is running as designed or not.
The bulkCopy()
path is new in Saxon 9.9 and we've had to do a bit of tweaking to get it to work well, so this is an area we will want to look at. It was introduced because we found a number of workloads where tree copying was contributing a very large part of the stylesheet cost, and from a very quick look at your code I see that it has some rather interesting xsl:copy-of
instructions which will certainly be an area to examine.
RE: Saxon very slow when transforming large xml file - Added by Michael Kay over 5 years ago
Confirmed that on Saxon-HE 9.8.0.15 the execution time is 3.69 seconds. So we have a serious regression between HE 9.8 and HE 9.9, which means we have to treat it as a bug. I will transfer this to the bug tracker since that's more suitable than the forum.
I also confirmed that under Saxon-EE, the bulk copy code is not being invoked.
Incidentally, regarding the title of this post, we don't really regard a 3Mb source file as "large" these days. "Large" means gigabyte-sized. Though a 3Mb file can still give problems if the performance is quadratic in file size, of course.
RE: Saxon very slow when transforming large xml file - Added by Michael Staal-Olsen over 5 years ago
Thank you for taking a look at the matter so quickly. And sorry for the misuse of the word "large"...
So you think this is a bug introduced in 9.9? And specifically in HE? Do you know how it can be different in EE, and why bulkCopy is not invoked in EE?
So you also think this will be solved? Do you have an idea of when, and do you think you will (when some investigation has been made) explain the reason how the bug was introduced? What was the purpose of introducing the bulkCopy method?
Due to my interest in the problem, I hope to gain updates on the matter. Once again, thank you for your quick reply!
RE: Saxon very slow when transforming large xml file - Added by Michael Staal-Olsen over 5 years ago
One more thing: I may be wrong, but after having made a minor investigation, it seems to me that HE version 9.9.1-2 works similarly (from what I can see) fine, whereas 9.9.1-3 and 9.9.1-4 are affected with the bug. So my claim is (at least for now) that something went wrong as part of release 9.9.1-3. But I may of course be wrong. Is your experience that EE and HE produces the same XML output?
Another minor question: From what I can see the bulkCopy method was also part of 9.8, is it not? And would you recommend 9.9.1-2 over 9.8.0.15?
RE: Saxon very slow when transforming large xml file - Added by Michael Kay over 5 years ago
I need to investigate the problem before I can answer your questions.
I have raised a bug entry at https://saxonica.plan.io/issues/4273 -- please track that, because I will record the progress of my investigation there.
RE: Saxon very slow when transforming large xml file - Added by Michael Kay over 5 years ago
As you'll see from the bug tracker I have isolated the factor that is causing the performance regression: it's poor handling of a source document that contains many namespace declarations on non-top-level elements. In this case there are 307 such namespace declarations.
RE: Saxon very slow when transforming large xml file - Added by Michael Staal-Olsen over 5 years ago
It is a very interesting read! And web have been discussing the use of namespaces internally. But do you intend to make a fix?
I am not quite sure why the bulkCopy feature can be "turned off" which you claim is done on EE? Does it not affect the end resultat? And why does EE not do the same?
Is it better for me (at the moment) to stick with 9.9.1.2?
RE: Saxon very slow when transforming large xml file - Added by Michael Kay over 5 years ago
"Bulk copy" is an optimization (which is why the code still works if it is disabled completely or under certain circumstances). When an xsl:copy-of instruction selects a subtree of one TinyTree and attaches it to an element in another, then rather than using the generic copy mechanism applicable to all trees, we have a fast path that physically copies the data structure, which ought to be a lot more efficient. But the big problem with copying trees in the XDM model is sensitivity to namespace context, and that's the issue we're dealing with here.
I'm sure we will find a fix, but the fallback position is that we pull the feature entirely from the 9.9 branch. We've changed the way namespaces are represented in the TinyTree for the next major release and that makes the design much cleaner.
Until we deliver a fix, sure, carry on with 9.9.1.2
RE: Saxon very slow when transforming large xml file - Added by Michael Staal-Olsen over 5 years ago
Is it correct, that the issue has now been fixed in general for both v. 9.9 and v. 10.0?
In that case: Thanks for a job well done!
RE: Saxon very slow when transforming large xml file - Added by Michael Kay over 5 years ago
For the particular test case you supplied, the problem is fixed in 9.9.1.5 and on the development branch (in different ways).
Generally, though, documents with many namespace declarations provide ample scope for performance problems and I'm sure this will not be the last example we see. One thing we've learnt from this particular use case is that the redesign for 10.0 solves some problems and creates others.
Please register to reply