Support #3237
closedCalling transform calls closeResult for a StreamResult
0%
Description
I am using saxon9ee-9.7.0.18 and an XSLT in streaming mode intended to transform a large source file. I have a requirement for a TransformerHandler which I can then call SaxEevnts on and get resulting output as a stream of events. However the Xslt30Transformer / StreamingTransformerFactory does not provide a TransformerHandler.
As a workaround I am calling transform by passing in an XMLReader as source, and as expected the transformer sets the ContentHandler of the XMLReader to a ReceivingContentHandler, which I can now use to invoke my sax events on. Until this point, it works as expected but with the following two problems :
-
The Controller calls closeResult as against waiting for the sax events to be processed and the callee calling close on the ResultStream.
-
The behaviour is inconsistent :
a) If the source sax event stream is small, there is no output giving the impression that the resultStream is closed almost immediately (in attached Program.java set NUM_ROWS = 10)
b) If the source sax event stream is a bit bigger, the result stream hs the desired transformed output, but terminates prematurely and the last few sax events are missed.
Tried work-around :
I have tried setting the result to a NonCloseFileOutputStream where the close() method is commented (am writing stack trace instad to identify what is calling the close).
The work-around does not work as it appears something upstream has closed already and no more data is being sent to the stream.
Instructions to test :
-
Attachd Program with main method and 4 other dependant classes.
-
In the main, straightTransform() is commented and ws only used to test the basic transform works as expected. this can be left commented.
-
Set RESULT_FILE to appropriate output file path.
-
Run with NUM_ROWS = 10 and no output is received.
-
Run with NUM_ROWS = 100 and desired transformed output is received by prematurely truncated.
Messages are written to StdOut to reflect :
-
The ReceivingContentHandler being assigned as Contenthandler on the XMLReader
-
The call to close the result stream
-
The sax events being fired after the call to close()
Files
Updated by O'Neil Delpratt over 7 years ago
- Project changed from SaxonC to Saxon
- Applies to branch 9.7 added
Updated by Michael Kay over 7 years ago
You're making two calls on transform() here. The first is from the ContentHandlerForwarder constructor, called from Program line 62.
In this call you supply a SAXSource containing an XMLReader. Saxon calls the parse() method on the XMLReader to initiate the transformation. But the parse() method on your PseudoXMLReader does nothing, it returns immediately. Saxon assumes that when the parse() method returns, all the input has been processed, and therefore it can close the output.
You then initiate a second transform from Program line 65. But this is using the same StreamResult, which has already been closed.
I think your basic technique should work: you supply a SAXSource containing an XMLReader, Saxon calls the XMLReader.parse() method, the transformation happens, the parse() method returns, Saxon closes the result, the transform() method returns. But I don't understand why you have two implementations of XMLReader, why one of them has a parse() method that does nothing, or why you are calling transform() twice.
Updated by Aniruddha Joag over 7 years ago
Thank you for the prompt response. I should have probably explained my objective a bit clearer. I have managed to do a straight forward chained transform, but the requirement I have is to do a chained transform, but with a twist. The output of the first Transform needs to be sent as input to two different Transformers (with different XSLTs).
The way I am trying to achieve this is as follows. It is rather convoluted and I could be wrong in my approach, and you might very well suggest me another way to do this :
For the first transform I assign result as SAXResult(duplicatingContentHandler)
duplicatingContentHandler is an instance of a class that implements ContentHandler and overrides all sax event methods to invoke the same on its constituent ContentHandlers, for example :
@Override
public void startDocument() throws SAXException
{
contentHandler1.startDocument();
contentHandler2.startDocument();
}
@Override
public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException
{
contentHandler1.startElement(uri, localName, qName, atts);
contentHandler2.startElement(uri, localName, qName, atts);
}
To keep it simple, in the example I have posted, rather than duplicating as above, I am just forwarding these resulting sax events to a single ContentHandler - that is what my class ContentHandlerForwarder does.
The ContentHandlerForwarder instance needs to be created before I can call my first transform, and my constructor for it looks like this :
private ContentHandler contentHandlerToForwardTo;
public ContentHandlerForwarder(Transformer transformer, StreamResult result) throws TransformerException
{
PseudoXMLReader pseudoXMLReader = new PseudoXMLReader();
SAXSource saxSource = new SAXSource(pseudoXMLReader, null);
System.out.println("ContentHandlerForwarder -> transform");
transformer.transform(saxSource, result);
contentHandlerToForwardTo = pseudoXMLReader.getContentHandler();
}
When this constructor is called the contentHandlerToForwardTo is null. So in the constructor I call a transform passing in PseudoXMLReader as source (in which as you pointed out - the parse method does nothing). As expected, on calling transform, the transformer calls setContentHandler on the PseudoXMLReader and sends in net.sf.saxon.event.ReceivingContentHandler, which the PseudoXMLReader stores. The transformer then calls parse on the PseudoXMLReader , which does nothing and returns immediately and now I am back in my ContentHandlerForwarder constructor where I get the reference to the ReceivingContentHandler(the output of the first transform) from the pseudoXMLReader and assign it to contentHandlerToForwardTo.
So to summarise what I have done is the Resulting Sax Events of the first transform can now be sent to one or more ContentHandlers which act as input for another transform (via PseudoXMLReaders).
parse() does not need to do anything in the PseudoXMLReader, because my original source sax events flow through to transformed sax events output of first transform which then flow through to one or more ContentHandlers which act as input to other transformers.
As I mentioned in my example if you set NUM_ROWS = 100, all this actually works more or less as expected (albeit truncated at the end) and the resulting output is the chained transform. Running the straightTransform() you get the output elements as
Running forwardingHandlerTransform() uses the modified xslt :
String newXslt = strXslt.replace("x-", "y-").replace(""name"", ""x-name"").replace(""id"", ""x-id"");
and the resulting output has the following elements as expected :
From what you have mentioned, I am guessing, the fact that the parse method of the PseudoXMLReader returns immediately, causes the first transformer to close the result stream. I am intrigued as to how we get to see some results when the source is big enough and see nothing when the source is small. Is the bigger size of the source causing a delay and giving enough time for resultsto be sent to the output stream. The first thought that came to mind was to delay the parse even more as follows
private void parseImpl()
{
System.out.println("PseudoXMLReader -> parseImpl(). The (next chained) transformer has called parse. Will wait for 5 secs");
try
{
Thread.sleep(5000);
} catch (InterruptedException e)
{
e.printStackTrace();
}
}
But that does not work.
Updated by Aniruddha Joag over 7 years ago
Apologies this paragraph above :
When this constructor is called the contentHandlerToForwardTo is null. So in the constructor I call a transform passing in PseudoXMLReader as source (in which as you pointed out - the parse method does nothing). As expected, on calling transform, the transformer calls setContentHandler on the PseudoXMLReader and sends in net.sf.saxon.event.ReceivingContentHandler, which the PseudoXMLReader stores. The transformer then calls parse on the PseudoXMLReader , which does nothing and returns immediately and now I am back in my ContentHandlerForwarder constructor where I get the reference to the ReceivingContentHandler(the output of the first transform) from the pseudoXMLReader and assign it to contentHandlerToForwardTo.
wrongly mentions "(the output of the first transform)". it should actually refer to the second transform as follows ;
When this constructor is called the contentHandlerToForwardTo is null. So in the constructor I call a transform passing in PseudoXMLReader as source (in which as you pointed out - the parse method does nothing). As expected, on calling transform, the transformer calls setContentHandler on the PseudoXMLReader and sends in net.sf.saxon.event.ReceivingContentHandler, which the PseudoXMLReader stores. The transformer then calls parse on the PseudoXMLReader , which does nothing and returns immediately and now I am back in my ContentHandlerForwarder constructor where I get the reference to the ReceivingContentHandler(which will act as second transform for the sax events) from the pseudoXMLReader and assign it to contentHandlerToForwardTo.
Updated by Michael Kay over 7 years ago
I'm not too concerned with investigating the exact reasons for the symptoms you observe: we've established that the code can't possibly work the way it's written, so the exact detail of how it fails is not of great interest. I should think it's something to do with buffering of input or output.
Because with this architecture the Transformer has to call back to the application's parse() method, and the transformation executes before parse() returns, it's not going to be possible to fork the event pipeline in the way you describe. It should be possible with a TransformerHandler, because that doesn't involve the callback to the parse() method. In 9.8 the StreamingTransformerFactory will be able to generate a TransformerHandler.
I would suggest attempting this as follows: Given the requirement "The output of the first Transform T needs to be sent as input to two different Transformers U and V (with different XSLTs)", use the s9api calls:
Processor p = new Processor(true);
XsltCompiler c = p.newXsltCompiler();
XsltTransformer t = c.compile(TTT).load():
XsltTransfomer u = c.compile(UUU).load();
XsltTransformer v = c.compile(VVV).load();
TeeDestination w = new TeeDestination(u, v);
t.setDestination(w);
t.transform();
Updated by Aniruddha Joag over 7 years ago
Thank you for this. In my current situation, the first transform T is actually the result of a series of chained transforms, but i guess as long as I set its destination to a TeeDestination, It should work. Will test and update.
Updated by Aniruddha Joag over 7 years ago
Xslt30Transformer (unlike XsltTransfomer) cannot act s Destination, so I cannot do :
XsltTransfomer u = c.compile(UUU).load();
XsltTransformer v = c.compile(VVV).load();
TeeDestination w = new TeeDestination(u, v);
I need to be doing this in streaming, so guess I wil have to use Xslt30Transformer .
Updated by Michael Kay over 7 years ago
Sorry, I dropped the ball on this one. It's back on my list.
Updated by Aniruddha Joag over 7 years ago
- File RowXMLReader.java RowXMLReader.java added
- File ContentHandlerForwarder.java ContentHandlerForwarder.java added
- File NonCloseFileOutputStream.java NonCloseFileOutputStream.java added
- File Program.java Program.java added
- File PseudoXMLReader.java PseudoXMLReader.java added
Yes sir, a transformer handler would have made it easier. but while I wait for 9.8, I need to get this to work in some way. As you pointed out, the secondary transform calls the parse method which returns imediately, and closes the result stream. Though I have access to the ReceivingContentHandler (effectively TransformerHandler) of the secondary transfomer and can pipe it the resulting sax events from the primary transformer, the closed stream does not help.
So was thinking of changing this so that, we get the reference to the ReceivingContentHandler of the secondary transform, store it in my forking mechanism (ContentHandlerForwarder) and then after that attempt to call the primary transform.
Have managed to get that to work, I have tested for a single row as well, as 1000 rows.
Though I must admit my implemenation is a bit crude and clumsy, may be you can point me in the direction of doing it correctely. Am attaching the changed source files.
Updated by Michael Kay over 7 years ago
I'm sorry: if you can get your attempt to work that's fine, but I'm not going to encourage you down this route. I think it's unlikely to work; it's certainly not the way the code is designed to work. I will continue to look for a way of meeting your requirement at the s9api level.
Updated by Aniruddha Joag over 7 years ago
Thank you Sir, much appreciate that, will wait for your solution. In that case will use my work-around just as a place holder to move further.
Updated by Aniruddha Joag over 7 years ago
- File ContentHandlerForwarder.java ContentHandlerForwarder.java added
- File TransformationSequencingXMLReader.java TransformationSequencingXMLReader.java added
- File Program.java Program.java added
- File Transformation.java Transformation.java added
- File RowXMLReader.java RowXMLReader.java added
Have not had the chance to work on this very recently, and as I mentioned earlier, my code was rather convoluted. I have refactored it now into more reusable units and perhaps it will now appear to be using the transformer as it traditionally should. As before I have a sax source (RowXMLReader). The main transform (Program.strXslt) crates a sax result. I now have method ContentHandlerForwarder.forkResult which accepts an array of forked transformations. I have tried to fork the sax result into 3 transformed results, each with its own transformation. Hopefully this will appeal to you as a conventional solution to this problem. If not, I will await your solution.
Updated by Michael Kay over 7 years ago
To get this to compile, I changed StrmUtils.getIdentityTransformer() to call
private static Transformer getIdentityTransformer() throws TransformerConfigurationException {
final StreamingTransformerFactory transformerFactory = new StreamingTransformerFactory(new EnterpriseConfiguration());
return transformerFactory.newTransformer();
}
Hope that's a correct guess...
I now run Program.main, and it executes with no errors and no output.
Looking more carefully, I see it has silently created three files fork1.xml, fork2.xml, and fork3.xml deep in my filestore. (I hate it when I run customer-supplied code and it does that...) Changed the file names and ran it again.
The files contain sensible data. Everything seems to be working. I'm now not sure what I'm looking for. Looking at your message, perhaps you are just asking me to review your code? Well, it's fairly complex, and I don't think I fully understand it, but I'm not sure what feedback you are looking for.
By the way, I'm running this on 9.8.
Updated by Aniruddha Joag over 7 years ago
Yes Sir, that (getIdentityTransformer() ) was the correct code, apologies I missed that when I uploaded the files. My initial code to get this working was very convoluted and as you rightly mentioned, it did not look like it was designed to work that way. This is a bit more refined and works on the principle that XMLReader.parse() should not return immediately (so the result stream does not close). I am now chaining the next forked transform in the parse method and it returns only after calling all the forked transforms and getting the handle to each receiving content handler. So yes, just wanted your feedback about whether you think this code is workable until I can use your solution. Many thanks for the time you have spent on this, very much appreciated.
Updated by Michael Kay over 7 years ago
- Status changed from New to Closed
OK, thanks, I think we can close this now.
Please register to edit this issue