Support #3519
closedDifference in streaming through xsl:source-document vs xsl:mode
0%
Description
Is there any difference between ways to allow streaming? Or how is xsl:source-document different from xsl:mode?
For xsl:source-document, attribute href has to be initialized with reference to input stream. An input is also provided to Transformer as a StreamSource. How are these two streams different? Or they need to be same?
Files
Updated by Michael Kay about 7 years ago
This is analogous to unstreamed processing, where there are two ways to get an input document: you can supply it externally from the calling application (so the match="/" template is invoked to process an externally-supplied document), or you can get it from within your stylesheet logic by calling the document() function.
Similarly with streaming: you can either supply the input externally from the calling application and initiate processing at the match="/" template, or you can get the streamed document from within your stylesheet logic using the xsl:source-document instruction. Use whichever approach is more convenient to you.
Updated by Mohd Shadab about 7 years ago
I tried a simple transform with 6GB xml file, but it gives Out Of Memory when passing the input externally from the calling application. While same input when streamed document from within stylesheet logic using the xsl:source-document it works fine. Also, when using xsl:source-document, the external input was a dummy xml of 1kb size.
<xsl:template match="/">
<html>
<body>
<table border="1">
<xsl:source-document streamable="yes" href="{$input-uri}">
<xsl:for-each select="osm/node">
<tr>
<td><xsl:value-of select="@lat" /></td>
<td><xsl:value-of select="@lon" /></td>
</tr>
</xsl:for-each>
</xsl:source-document>
</table>
</body>
</html>
</xsl:template>
Updated by Martin Honnen about 7 years ago
Transformer
sounds as if you use Java and JAXP, in that case to use streaming with the primary input document make sure you use https://www.saxonica.com/html/documentation/javadoc/com/saxonica/config/StreamingTransformerFactory.html to create the Transformer.
Updated by Mohd Shadab about 7 years ago
Maybe i am missing something, here is the code,
InputStream includeXSL = new FileInputStream(new File(xslDoc));
StreamSource src = new StreamSource(includeXSL);
StreamingTransformerFactory strf = new StreamingTransformerFactory();
Transformer _transformer = strf.newTransformer(src);
_transformer.setParameter("input-uri", "a.xml");
InputStream is = new FileInputStream(new File(sourceDoc));
Reader reader = new InputStreamReader(is, "ISO-8859-1");
OutputStream out = new FileOutputStream(new File(resultDoc));
_transformer.transform(new StreamSource(reader), new StreamResult(out));
Here, the 6GB xml file is set as a parameter and the same file is passed as a StreamSource. It gives OutOfMemory Error when same file is passed both as a parameter and as a StreamSource.
However, if the StreamSource is changed to a smaller size file and 6GB file is parameter, then it works fine.
Updated by Michael Kay about 7 years ago
Is the unnamed mode in the stylesheet declared as streamable? I.e. does it have
<xsl:mode streamable="yes"/>
Updated by Mohd Shadab about 7 years ago
yes, tried with both +<xsl:mode streamable="yes"/>+ and without it as well. XSL in thread above can be tried with this xml and the xml file can be downloaded from, http://download.geofabrik.de/north-america/us/massachusetts-latest.osm.bz2
Updated by Michael Kay about 7 years ago
I've done a test which approximates as closely as I can what you say you are doing, and it all works for me (including streaming the same document both supplied externally and read using xsl:source. I think you need to provide a complete repro: free-standing Java code, XSLT stylesheet, and source document.
Note that you don't need a large source document to test this: if you run with the -t option (or FeatureKeys.TIMING set) the output will tell you whether input was streamed or whether a tree is being built in memory.
Updated by Mohd Shadab about 7 years ago
When running with trace on, it gives
Streaming null
Using parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser
URIResolver.resolve href="E:/sw/xsl/a2.xml" base=""
Streaming input document E:/sw/xsl/a2.xml
Using parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser
OutOfMemory may be coming due to jaxp.
InputStream includeXSL = new FileInputStream(new File(xslDoc));
StreamSource src = new StreamSource(includeXSL);
StreamingTransformerFactory strf = new StreamingTransformerFactory();
Transformer _transformer = strf.newTransformer(src);
strf.setAttribute(FeatureKeys.LICENSE_FILE_LOCATION, loc);
strf.setAttribute(FeatureKeys.TRACE_LISTENER_OUTPUT_FILE, "a.txt");
strf.setAttribute(FeatureKeys.TIMING, "true");
strf.setAttribute(FeatureKeys.TRACE_OPTIMIZER_DECISIONS, "true");
_transformer.setParameter("input-uri", "E:/sw/xsl/a2.xml");
InputStream is = new FileInputStream(new File(sourceDoc));
Reader reader = new InputStreamReader(is, "ISO-8859-1");
OutputStream out = new FileOutputStream(new File(resultDoc));
_transformer.transform(new StreamSource(reader), new StreamResult(out));
<xsl:param name="input-uri" as="xs:string"/>
<xsl:mode streamable="yes"/>
<xsl:template match="/">
<html>
<body>
<table border="1">
<xsl:source-document streamable="yes" href="{$input-uri}">
<xsl:for-each select="osm/node">
<tr>
<td><xsl:value-of select="@lat" /></td>
<td><xsl:value-of select="@lon" /></td>
</tr>
</xsl:for-each>
</xsl:source-document>
</table>
</body>
</html>
</xsl:template>
Updated by Michael Kay about 7 years ago
OK, so Saxon is streaming, so we need to find out why it's running out of memory. Get a stack trace of what's happening when the out-of-memorye exception occurs, and take a heap dump and analyze it to see what objects are present in the heap at the point of failure.
Updated by Mohd Shadab about 7 years ago
- File java_pid4900_Leak_Suspects.zip java_pid4900_Leak_Suspects.zip added
- File report58359086349417.pdf report58359086349417.pdf added
PDF has the detailed report and stack trace gives error,
The memory is accumulated in one instance of "net.sf.saxon.tree.tiny.TinyTree"
java.lang.OutOfMemoryError.()V (Unknown Source)
at java.util.Arrays.copyOf([II)[I (Unknown Source)
at net.sf.saxon.tree.tiny.TinyTree.ensureNodeCapacity(S)V (TinyTree.java:228)
at net.sf.saxon.tree.tiny.TinyTree.addNode(SIIII)I (TinyTree.java:338)
at net.sf.saxon.tree.tiny.TinyBuilder.makeTextNode(Ljava/lang/CharSequence;I)I (TinyBuilder.java:405)
at net.sf.saxon.tree.tiny.TinyBuilder.characters(Ljava/lang/CharSequence;Lnet/sf/saxon/expr/parser/Location;I)V (TinyBuilder.java:381)
at net.sf.saxon.event.ProxyReceiver.characters(Ljava/lang/CharSequence;Lnet/sf/saxon/expr/parser/Location;I)V (ProxyReceiver.java:190)
Updated by Michael Kay about 7 years ago
Thanks. These diagnostics make it clear that you're running out of memory while constructing a result tree (possibly the final result tree, possibly a temporary tree) from your stylesheet code. I don't think you've provided enough of your XSLT code to show where this is happening. One possibility is that you are building the stylesheet result as an in-memory tree, another is that you are using xsl:copy or xsl:copy-of to copy large chunks of your (streamed) input to an (unstreamed) output tree.
Updated by Mohd Shadab about 7 years ago
XSL below works if StreamSource is a smaller xml file and input-uri is the 6GB file. But is failing when both StreamSource and input-uri are the same 6GB file. Code being used is in thread above. No xsl:copy-of is used and its working in one case but not in other.
Would it be possible to share example which you tried and we can try that as well?
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:param name="input-uri" as="xs:string"/>
<xsl:mode streamable="yes"/>
<xsl:template match="/">
<html>
<body>
<table border="1">
<xsl:source-document streamable="yes" href="{$input-uri}">
<xsl:for-each select="osm/node">
<tr>
<td><xsl:value-of select="@lat" /></td>
<td><xsl:value-of select="@lon" /></td>
</tr>
</xsl:for-each>
</xsl:source-document>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Updated by Michael Kay about 7 years ago
I haven't confirmed this, but I think the problem probably arises from using xsl:source-document within a streamable template rule. I think the output of the xsl:source-document instruction is being accumulated in memory before being written to the final output.
Is there any good reason why you are doing it this way?
Updated by Mohd Shadab about 7 years ago
We are used to passing input as StreamSource by default for all transformations, but with this OOM error its clear that when using xsl:source-document the StreamSource should not be same file as the one passed as parameter. We are looking to automate xsl generation and run all such transforms by passing StreamSource, except for this scenario where we need to pass input stream as a parameter.
Updated by Michael Kay almost 7 years ago
- Status changed from New to Closed
- Assignee set to Michael Kay
I think the original question has been answered. If you have further questions about streaming please raise a new issue.
Please register to edit this issue