Project

Profile

Help

Extreme speed differences when running the same transformation - how to optimize?

Added by Kai Weber over 11 years ago

I am always noticing extreme speed differences when I'm running XSLT scripts with Saxon-HE 9 from the Windows shell or from oXygen XML Editor. The speed difference can be tremendous: The same XSLT script operating on the same file runs in 1.6 seconds within oXygen and in 16 minutes and 6 seconds (!) from the command line, on the same machine. I assume those differences have something to with my usage of the document()-function. Scripts that don't use this function run fast in both environments. But especially when several documents are processed in a loop, the extreme slowing down can be observed, e.g.:


        
            
                
            
        

Can anybody give me a hint where I could start to learn about speed optimization, so that I could make my batch script run as fast as the transformation from within the oXygen editor?

Best regards, Kai


Replies (8)

Please register to reply

RE: Extreme speed differences when running the same transformation - how to optimize? - Added by Kai Weber over 11 years ago

I should add the template for opf:item to my code example from my question above, as this template itself is calling other documents:


    
        
        
            
                
            
        
    

RE: Extreme speed differences when running the same transformation - how to optimize? - Added by Michael Kay over 11 years ago

Certainly I would look first to see whether there is some difference in the way documents are fetched. Using the -t option would be a good start to see what is being read at the Saxon level, though it won't give you information about (for example) HTTP requests for reading DTDs originating from the XML parser.

A common cause of severe delays is fetching commonly-used DTDs from the W3C web server. W3C deliberately throttle the rate of such requests to discourage their use (you should be using a local copy). Saxon contains copies of the most commonly used files, but the list isn't complete (I've tried to get a complete list from W3C but apparently it doesn't exist). You may want to do some lower-level monitoring to see what HTTP requests your application is issuing.

Check your Saxon version number. "Saxon HE 9" covers a multitude of major and minor version numbers, and this area has evolved over the last year or two as W3C has become more aggressive in applying the brakes.

RE: Extreme speed differences when running the same transformation - how to optimize? - Added by Kai Weber over 11 years ago

Thanks for the reply. With -t enabled, it's getting really interesting, as I know find more specifically, that the speed difference is actually occurring between calling Saxon HE 9.5.1.1 from a single command line directly in the cmd shell (= fast execution) and calling it from a batch script (= extremely slow execution).

In the batch version I get a lot of messages like these:

Fetching Saxon copy of w3c/xhtml11/xhtml11.dtd
Fetching Saxon copy of w3c/xhtml11/xhtml-framework-1.mod
Fetching Saxon copy of w3c/xhtml11/xhtml-datatypes-1.mod
Fetching Saxon copy of w3c/xhtml11/xhtml-qname-1.mod
Fetching Saxon copy of w3c/xhtml11/xhtml-events-1.mod
Fetching Saxon copy of w3c/xhtml11/xhtml-attribs-1.mod
...
Fetching Saxon copy of w3c/xhtml11/xhtml-struct-1.mod
Tree built in 60526 milliseconds
Tree size: 15 nodes, 12 characters, 20 attributes

That sounds like it's retrieving those files from the saxon .jar rather than from the w3c website, but maybe my interpretation is wrong here? When calling the script in a single line on the shell, there are no "Fetching..." messages at all:

Tree built in 1 milliseconds
Tree size: 27 nodes, 12 characters, 13 attributes

I attach the complete -t output of the two different runs here. The command for starting Saxon was the same in the batch script and on the shell:

java -classpath .\bin\saxon9he.jar net.sf.saxon.Transform -t -s:.\temp\META-INF\container.xml -xsl:.\lib\insertPageAnchors.xsl -o:.\temp\temp.xml outpath=transformed/ > batch.text 2>&1
cmdline.txt (13.3 KB) cmdline.txt Fast command line execution
batch.txt (40.3 KB) batch.txt Slow batch execution

RE: Extreme speed differences when running the same transformation - how to optimize? - Added by Michael Kay over 11 years ago

How very strange. We're certainly on the trail of the problem here. At first sight though I can't think of any possible reason why the single-shot command line should behave differently from the batch script, and I can't think of any obvious diagnostic to apply next. I'll cogitate on it.

RE: Extreme speed differences when running the same transformation - how to optimize? - Added by Michael Kay over 11 years ago

Still very puzzled.

Firstly, I can't see any possible reason why the same command should behave differently depending on whether it's part of a script or not. There must be some extraneous condition that happens to be different in the two cases: perhaps an environment variable, perhaps the classpath.

Secondly, the one that runs slowly is the one that reports it is using the cached Saxon copies of the files, whereas one would expect it to be the other way around.

RE: Extreme speed differences when running the same transformation - how to optimize? - Added by Michael Kay over 11 years ago

Could you post a copy of the input files/stylesheets for this transformation?

RE: Extreme speed differences when running the same transformation - how to optimize? - Added by Kai Weber over 11 years ago

I've attached the files to this post. After unzipping, you need to put a copy of saxon9he.jar into the /bin folder, then you should be ready to go. As the speed problem seems to somehow depend on the environment: I was using the cmd shell on a 64-bit Windows 7 Professional. And as a side-note: Last weekend I also tested a shell script version on a Ubuntu Linux system, where the execution was always fetching the saxon copies of w3c files and requiring around 15 minutes of execution time, no matter if invoked from a shell script or directly as a shell command.

Probably your first assumption is still true: Somewhere down the line, it might try to get some files from the w3c servers instead of the saxon archive? I haven't monitored network traffic during execution of the script, and I'm not (yet) familiar with any tools that might do so with reasonable result output.

Best regards, Kai

RE: Extreme speed differences when running the same transformation - how to optimize? - Added by Michael Kay over 11 years ago

OK thanks.

I don't know why under some circumstances this is running fast. But I do know why other cases are running slow; it's essentially this bug here:

https://saxonica.plan.io/issues/1813

The files it can't find locally include

http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-inlstyle-1.mod http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-edit-1.mod http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-bdo-1.mod

and when I fix these I get another on

"http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-style-1.mod"

Will discuss further under bug 1813.

    (1-8/8)

    Please register to reply