Project

Profile

Help

Apply multiple XSL transforms to the same XML

Added by Anonymous over 14 years ago

Legacy ID: #8516414 Legacy Poster: Ken Tam (kentam)

Hello, I am working on a project where multiple XSL transform scripts will be applied to the same XML source. These multiple XSL scripts will be written by different teams and deployed independently. Thus, they can't be combined into a single XSL script - at least not easily. The average size of the XML source is about 50KB but the number of XML source can be up to 20,000 an hour. I am currently using the JAXP interface to pre-compile the XSL scripts into Templates and process the XML source in a multi-threaded environment. I haven't tested the actual throughput of this setup yet but am wondering if there is a better approach because it seems like the internal XML data structure needs to be rebuilt for each XSL script. Here is the code fragment: DOMSource domRoot = new DOMSource(root); for ( int i = 0; i < xsltCount; i++ ) { Transformer transformer = templates[i].newTransformer(); DOMResult dRes = new DOMResult(); transformer.transform(domRoot, dRes); } That is, the internal structure of XML domRoot needs to be rebuilt for each transform() call. Is this correct? Is there anyway to preserve the internal structure across different transformations? For example, some of the XSL scripts use the xsl:for-each-group construct with the same condition. This seems to imply the internal data structure needs to be rebuilt for each transformation. Would S9API help in this case by building the internal structure in XdmNode and passes it to each of the pre-complied XsltTransformer? I am currently using saxon-9.1.0.2. Would it help to move to Saxon 9HE or EE? Thanks.


Replies (6)

Please register to reply

RE: Apply multiple XSL transforms to the same XML - Added by Anonymous over 14 years ago

Legacy ID: #8516995 Legacy Poster: Michael Kay (mhkay)

It's not a good idea to use a DOM for this purpose for two reasons: firstly, Saxon is 5-10 times slower on a DOM than on its native tree format, and secondly, the DOM is not thread-safe - you can't use the same DOM document in parallel threads, even in read mode. It's much better to use the native tree model. There are two ways you can do this: (a) build the document using Configuration.buildDocument() - this returns a DocumentInfo, which implements Source, and can therefore be passed to the JAXP Transform method. You will need to use the Configuration contained within your TransformerFactory, which you can get either be instantiating net.sf.saxon.TransformerFactoryImpl with your own Configuration, or by casting the TransformerFactory to its Saxon implementation class and using the getConfiguration() accessor method. (b) switch to using s9api, where you can create an XdmNode using the s9api DocumentBuilder, and set this as the context item on the XsltTransformer object. One thing to bear in mind when using the same document as input to several stylesheets is that it's best if they don't do any whitespace-stripping. It's most efficient to strip whitespace while building the tree, rather than while navigating it.

RE: Apply multiple XSL transforms to the same XML - Added by Anonymous over 14 years ago

Legacy ID: #8517425 Legacy Poster: Ken Tam (kentam)

Yes, the XML source DOM will be accessed by a single thread - multiple XSL scripts will be applied serially. This helps to manage transaction by treating the XSL scripts as one unit to commit or rollback results. I am already instantiating net.sf.saxon.TransformerFactoryImpl so option (a) will be easier to implement. However, I would like to confirm that using (b) doesn't provide any performance gain over (a). Is this correct? In addition, there isn't any other configuration setting to further improve performance - e.g. preserve internal data structure across transformations. Is this correct? Thanks for your help.

RE: Apply multiple XSL transforms to the same XML - Added by Anonymous over 14 years ago

Legacy ID: #8517715 Legacy Poster: Michael Kay (mhkay)

Using the s9api interfaces gives you cleaner code (in my opinion) but it won't run any faster. If you stick with a DOMSource rather than using a Saxon native tree, then apart from the thread safety issues, you should be aware that there are two ways you can do this in Saxon: you can wrap the DOM in a Saxon wrapper to implement the Saxon NodeInfo interface, or you can copy it to a native Saxon tree. Which is more efficient depends on (a) time vs memory trade-off, and (b) how much activity the transformation does. As a rule of thumb, if the transformation accesses each node of the source more than once, then copying is probably faster than wrapping. By default Saxon using wrapping rather than copying, but there are many switches and options that can change this, for example it will always copy if validation is requested. But both approaches are expensive compared with using a native Saxon tree in the first place.

RE: Apply multiple XSL transforms to the same XML - Added by Anonymous over 14 years ago

Legacy ID: #8519376 Legacy Poster: Ken Tam (kentam)

I don't need to stick with DOMSource. In fact, I'd prefer to use Saxon native tree. Here is the updated code: AugmentedSource domRoot = AugmentedSource.makeAugmentedSource(new DOMSource(root)); domRoot.setWrapDocument(true); DocumentInfo source = saxonConfig.buildDocument(domRoot); for ( int i = 0; i < xsltCount; i++ ) { Transformer transformer = templates[i].newTransformer(); DOMResult dRes = new DOMResult(); transformer.transform(source, dRes); } I am currently using saxon-9.1.0.2 and buildDocument() only takes one argument. AugmentedSource is used to ensure wrapping is turned off. saxonConfig is saved by calling getConfiguration() from net.sf.saxon.TransformerFactoryImpl. Let me know if buildDocument(new DomSource(root)) is sufficient to create a copy in native tree format. The above code fragment creates a native tree once per XML source and applied the same native tree to each XSL script. I am still wondering if I should combine all XSL scripts into one. Is it worth the effort? I guess the question comes down to how much overhead is there to apply the same XML in Saxon native tree format to multiple pre-compiled XSL templates as opposed to one combined pre-compiled XSL template. Thanks again for your help.

RE: Apply multiple XSL transforms to the same XML - Added by Anonymous over 14 years ago

Legacy ID: #8519686 Legacy Poster: Michael Kay (mhkay)

I would have expected to see domRoot.setWrapDocument(false) rather than domRoot.setWrapDocument(true) if you want to copy the tree rather than wrapping it. You can check that you have actually created a native Saxon tree by checking the implementation class of "source" - it should be TinyDocumentImpl. Otherwise the structure looks fine. I don't see any benefit in combining the separate transformations into a single stylesheet.

RE: Apply multiple XSL transforms to the same XML - Added by Anonymous over 14 years ago

Legacy ID: #8521160 Legacy Poster: Ken Tam (kentam)

Yes, my bad. It should be domRoot.setWrapDocument(false). I see TinyDocumentImpl as "source". Thanks again for your help.

    (1-6/6)

    Please register to reply