Project

Profile

Help

Support #3519

closed

Difference in streaming through xsl:source-document vs xsl:mode

Added by Mohd Shadab over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
2017-11-09
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

Is there any difference between ways to allow streaming? Or how is xsl:source-document different from xsl:mode?

For xsl:source-document, attribute href has to be initialized with reference to input stream. An input is also provided to Transformer as a StreamSource. How are these two streams different? Or they need to be same?


Files

java_pid4900_Leak_Suspects.zip (55.2 KB) java_pid4900_Leak_Suspects.zip Mohd Shadab, 2017-11-14 13:36
report58359086349417.pdf (91.8 KB) report58359086349417.pdf Mohd Shadab, 2017-11-14 13:36
Actions #1

Updated by Michael Kay over 6 years ago

This is analogous to unstreamed processing, where there are two ways to get an input document: you can supply it externally from the calling application (so the match="/" template is invoked to process an externally-supplied document), or you can get it from within your stylesheet logic by calling the document() function.

Similarly with streaming: you can either supply the input externally from the calling application and initiate processing at the match="/" template, or you can get the streamed document from within your stylesheet logic using the xsl:source-document instruction. Use whichever approach is more convenient to you.

Actions #2

Updated by Mohd Shadab over 6 years ago

I tried a simple transform with 6GB xml file, but it gives Out Of Memory when passing the input externally from the calling application. While same input when streamed document from within stylesheet logic using the xsl:source-document it works fine. Also, when using xsl:source-document, the external input was a dummy xml of 1kb size.

<xsl:template match="/">
  <html>
  <body>
    <table border="1">
      <xsl:source-document streamable="yes" href="{$input-uri}">
      <xsl:for-each select="osm/node">
      <tr>
        <td><xsl:value-of select="@lat" /></td>
        <td><xsl:value-of select="@lon" /></td>
      </tr>
      </xsl:for-each>
      </xsl:source-document>
    </table>
  </body>
  </html>
</xsl:template>
Actions #3

Updated by Martin Honnen over 6 years ago

Transformer sounds as if you use Java and JAXP, in that case to use streaming with the primary input document make sure you use https://www.saxonica.com/html/documentation/javadoc/com/saxonica/config/StreamingTransformerFactory.html to create the Transformer.

Actions #4

Updated by Mohd Shadab over 6 years ago

Maybe i am missing something, here is the code,

	InputStream includeXSL = new FileInputStream(new File(xslDoc));

	StreamSource src = new StreamSource(includeXSL);

	StreamingTransformerFactory strf = new StreamingTransformerFactory();

	Transformer _transformer = strf.newTransformer(src);


	_transformer.setParameter("input-uri", "a.xml");

	InputStream is = new FileInputStream(new File(sourceDoc));

	Reader reader = new InputStreamReader(is, "ISO-8859-1");

	OutputStream out = new FileOutputStream(new File(resultDoc));

	_transformer.transform(new StreamSource(reader), new StreamResult(out));

Here, the 6GB xml file is set as a parameter and the same file is passed as a StreamSource. It gives OutOfMemory Error when same file is passed both as a parameter and as a StreamSource.

However, if the StreamSource is changed to a smaller size file and 6GB file is parameter, then it works fine.

Actions #5

Updated by Michael Kay over 6 years ago

Is the unnamed mode in the stylesheet declared as streamable? I.e. does it have

<xsl:mode streamable="yes"/>
Actions #6

Updated by Mohd Shadab over 6 years ago

yes, tried with both +<xsl:mode streamable="yes"/>+ and without it as well. XSL in thread above can be tried with this xml and the xml file can be downloaded from, http://download.geofabrik.de/north-america/us/massachusetts-latest.osm.bz2

Actions #7

Updated by Michael Kay over 6 years ago

I've done a test which approximates as closely as I can what you say you are doing, and it all works for me (including streaming the same document both supplied externally and read using xsl:source. I think you need to provide a complete repro: free-standing Java code, XSLT stylesheet, and source document.

Note that you don't need a large source document to test this: if you run with the -t option (or FeatureKeys.TIMING set) the output will tell you whether input was streamed or whether a tree is being built in memory.

Actions #8

Updated by Mohd Shadab over 6 years ago

When running with trace on, it gives

Streaming null

Using parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser

URIResolver.resolve href="E:/sw/xsl/a2.xml" base=""

Streaming input document E:/sw/xsl/a2.xml

Using parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser

OutOfMemory may be coming due to jaxp.

InputStream includeXSL = new FileInputStream(new File(xslDoc));

StreamSource src = new StreamSource(includeXSL);

StreamingTransformerFactory strf = new StreamingTransformerFactory();

Transformer _transformer = strf.newTransformer(src);


strf.setAttribute(FeatureKeys.LICENSE_FILE_LOCATION, loc);

strf.setAttribute(FeatureKeys.TRACE_LISTENER_OUTPUT_FILE, "a.txt");

strf.setAttribute(FeatureKeys.TIMING, "true");

strf.setAttribute(FeatureKeys.TRACE_OPTIMIZER_DECISIONS, "true");

_transformer.setParameter("input-uri", "E:/sw/xsl/a2.xml");

InputStream is = new FileInputStream(new File(sourceDoc));

Reader reader = new InputStreamReader(is, "ISO-8859-1");

OutputStream out = new FileOutputStream(new File(resultDoc));

_transformer.transform(new StreamSource(reader), new StreamResult(out));
<xsl:param name="input-uri" as="xs:string"/>
<xsl:mode streamable="yes"/>
<xsl:template match="/">
  <html>
  <body>
    <table border="1">
      <xsl:source-document streamable="yes" href="{$input-uri}">
      <xsl:for-each select="osm/node">
      <tr>
        <td><xsl:value-of select="@lat" /></td>
        <td><xsl:value-of select="@lon" /></td>
      </tr>
      </xsl:for-each>
      </xsl:source-document>
    </table>
  </body>
  </html>
</xsl:template>
Actions #9

Updated by Michael Kay over 6 years ago

OK, so Saxon is streaming, so we need to find out why it's running out of memory. Get a stack trace of what's happening when the out-of-memorye exception occurs, and take a heap dump and analyze it to see what objects are present in the heap at the point of failure.

Actions #10

Updated by Mohd Shadab over 6 years ago

PDF has the detailed report and stack trace gives error,

The memory is accumulated in one instance of "net.sf.saxon.tree.tiny.TinyTree"

java.lang.OutOfMemoryError.()V (Unknown Source)

at java.util.Arrays.copyOf([II)[I (Unknown Source)

at net.sf.saxon.tree.tiny.TinyTree.ensureNodeCapacity(S)V (TinyTree.java:228)

at net.sf.saxon.tree.tiny.TinyTree.addNode(SIIII)I (TinyTree.java:338)

at net.sf.saxon.tree.tiny.TinyBuilder.makeTextNode(Ljava/lang/CharSequence;I)I (TinyBuilder.java:405)

at net.sf.saxon.tree.tiny.TinyBuilder.characters(Ljava/lang/CharSequence;Lnet/sf/saxon/expr/parser/Location;I)V (TinyBuilder.java:381)

at net.sf.saxon.event.ProxyReceiver.characters(Ljava/lang/CharSequence;Lnet/sf/saxon/expr/parser/Location;I)V (ProxyReceiver.java:190)

Actions #11

Updated by Michael Kay over 6 years ago

Thanks. These diagnostics make it clear that you're running out of memory while constructing a result tree (possibly the final result tree, possibly a temporary tree) from your stylesheet code. I don't think you've provided enough of your XSLT code to show where this is happening. One possibility is that you are building the stylesheet result as an in-memory tree, another is that you are using xsl:copy or xsl:copy-of to copy large chunks of your (streamed) input to an (unstreamed) output tree.

Actions #12

Updated by Mohd Shadab over 6 years ago

XSL below works if StreamSource is a smaller xml file and input-uri is the 6GB file. But is failing when both StreamSource and input-uri are the same 6GB file. Code being used is in thread above. No xsl:copy-of is used and its working in one case but not in other.

Would it be possible to share example which you tried and we can try that as well?

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xsl:param name="input-uri" as="xs:string"/>
<xsl:mode streamable="yes"/>
<xsl:template match="/">
  <html>
  <body>
    <table border="1">
      <xsl:source-document streamable="yes" href="{$input-uri}">
      <xsl:for-each select="osm/node">
      <tr>
        <td><xsl:value-of select="@lat" /></td>
        <td><xsl:value-of select="@lon" /></td>
      </tr>
      </xsl:for-each>
      </xsl:source-document>
    </table>
  </body>
  </html>
</xsl:template>
</xsl:stylesheet>
Actions #13

Updated by Michael Kay over 6 years ago

I haven't confirmed this, but I think the problem probably arises from using xsl:source-document within a streamable template rule. I think the output of the xsl:source-document instruction is being accumulated in memory before being written to the final output.

Is there any good reason why you are doing it this way?

Actions #14

Updated by Mohd Shadab over 6 years ago

We are used to passing input as StreamSource by default for all transformations, but with this OOM error its clear that when using xsl:source-document the StreamSource should not be same file as the one passed as parameter. We are looking to automate xsl generation and run all such transforms by passing StreamSource, except for this scenario where we need to pass input stream as a parameter.

Actions #15

Updated by Michael Kay over 6 years ago

  • Status changed from New to Closed
  • Assignee set to Michael Kay

I think the original question has been answered. If you have further questions about streaming please raise a new issue.

Please register to edit this issue

Also available in: Atom PDF