Project

Profile

Help

Slow XSLT, text() and following-sibling::text()

Added by Gioele Barabucci over 8 years ago

Hello,

I'm using Saxon-PE 9.6.0.7 to transform a multi-MB document via XSLT. The problem I am facing is that the following XSLT transformation gets very slow when the input document document grows even slightly. When the number of input nodes doubles, the time it takes to do the transformation increases fourfold. (An analysis on various documents shows that the this XSLT has at least O(n^2) complexity.)

Could you please help me identify a possible solution to this problem?

This is the problematic XSLT:



    

    
        
        

        

        
    

    

and this is the external @identity.xsl@ file:



	
		
			
		
	

The XML file looks like the following:




000ucCuna1uc-Cuna m. N. of Vais3a1khavESAKa, L. 1321,230442.1

501ucCuz1uc-Cuz

ud-Suz

P. -Cuzyati , to_dry_up ChUp._iv_,_3_,_2 : Caus. -Cozayati , to_cause_to_dry_up to_parch MBh. R. S3a1rn3g. 021413 174,1 30443

110ucCuzka2uc-Cuzka mfn. dry_,_dried_up_,_withered Mr2icch. Katha1s. Ra1jat. 021414 174,1 30444

My main suspect is the @following-sibling::text()@ construct of in second template. However, removing it alleviates the problem, but does not solve it.

Does anybody see a possible cause?


Replies (6)

Please register to reply

RE: Slow XSLT, text() and following-sibling::text() - Added by Michael Kay over 8 years ago

You're almost certainly right that it's the following-sibling construct that's the culprit here. But having said that, I'm a bit puzzled by it. The way you describe the XML, the number of children of the @@ element grows, but the number of children of a @@ element stays much the same. If there's a non-linearity here, I would expect it to depend more on the number of text node children of a body element, rather than on the number of body elements.

I think this is related to https://saxonica.plan.io/issues/2401

Given a pattern A/B[P], Saxon first tests whether a node matches B, then it tests P, and finally A. This is sometimes the right strategy, but in this case it is wrong: it means that the following-sibling predicate is being evaluated for every text node whether or not its parent is a body element. We've changed this for the upcoming 9.7 release to try and make an intelligent decision whether to evaluate A first, or P first. In the meantime, you might find that it helps to rewrite

@body/text()[not(following-sibling::text())][. = ' ']@

as

@text()[parent::body][. = ' '][not(following-sibling::text())]@

so you evaluate the most expensive predicate only if the others are true.

RE: Slow XSLT, text() and following-sibling::text() - Added by Michael Kay over 8 years ago

I've checked that the new optimization for 9.7 gets this one right: it tests that the parent is a body element before evaluating either of the predicates.

RE: Slow XSLT, text() and following-sibling::text() - Added by Gioele Barabucci over 8 years ago

Thank you for the suggestions, but, sadly, the suggested expression is only ~2% faster than the original one, so it does not helps in practical terms.

I think the @following-sibling::text()@ is a red-herring. The following reduced XSLT still shows non-linear performances.


	
		
			
		
	

	

Saxon transforms a 30k elements document in ~6 seconds, for 60k elements it needs ~38 seconds.

At this point I think that either @priority@ is somehow influencing the processing speed or the @[1]@ predicate is causing the slowdown. Any other suggestion?

RE: Slow XSLT, text() and following-sibling::text() - Added by Michael Kay over 8 years ago

Could you please send us a source document that we can experiment with? Feel free to send it off-list if you don't want it published to the world.

RE: Slow XSLT, text() and following-sibling::text() - Added by Gioele Barabucci over 8 years ago

I have provided the complete XML file via email.

Anyway, it is possible to replicate this problem by just copying the provided three elements thousands of times.

RE: Slow XSLT, text() and following-sibling::text() - Added by Debbie Lockett over 8 years ago

We've replicated the non-linear performance in 9.6 for the reduced XSLT. The issue is avoided with the new optimization for 9.7, but we are looking at how performance could also be improved for 9.6. I've raised a bug to track progress: https://saxonica.plan.io/issues/2489

    (1-6/6)

    Please register to reply