Project

Profile

Help

Support #5959

closed

XSLT 3.0 streaming results in OutOfMemoryError

Added by Mark Hansen 11 months ago. Updated 9 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
XSLT 3.0 packages
Sprint/Milestone:
-
Start date:
2023-04-06
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
12
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

Hello,

I have a big XML file and I tried to split it with a xsl 3.0 including streaming in smaller files to process.

Here is my Java Code:

 StreamingTransformerFactory streamingTransformerFactory =
        XmlTransformerHelper.createStreamingTransformerFactory();

 Templates streamingTemplates =
          XmlTransformerHelper.createTemplate(
              streamingTransformerFactory,
              getClass()
                  .getClassLoader()
                  .getResourceAsStream(RESOURCE_URL_XSL_SPLIT_BIG_ONIX_STREAM));

 okFile = new File(targetDirectory.toString(), "dummy.xml");

 streamingTemplates
          .newTransformer()
          .transform(new StreamSource(bookStream), new StreamResult(okFile));

Here is my XSL:

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                exclude-result-prefixes="#all"
                xmlns:s="http://www.book.org/book/3.0/short"
                xmlns:saxon="http://saxon.sf.net/"
>
    <xsl:mode streamable="yes" on-no-match="shallow-copy" use-accumulators="#all"/>

    <xsl:output indent="yes"/>

    <xsl:accumulator name="header" as="element()?" initial-value="()" streamable="yes">
        <xsl:accumulator-rule match="s:header" phase="end" saxon:capture="yes" select="."/>
    </xsl:accumulator>

    <xsl:template match="s:Entry">
        <xsl:result-document href="{position()}.xml" method="xml">
            <xsl:element name="{name(ancestor::*[last()])}" namespace="{namespace-uri(ancestor::*[last()])}">
                <xsl:copy-of select="accumulator-before('header')"/>
                <xsl:copy-of select="."/>
            </xsl:element>
        </xsl:result-document>
    </xsl:template>

</xsl:stylesheet>

It works fine for small XML files like 200 MB but for a 2 GB XML file I got a OutOfMemoryError (refering to the attachment ).

I startet my Spring Boot Application with this VM arguments "-Xms1G -Xmx3G".

My expectation would have been that with streaming a poor memory processing would have came up.

Is it a wrong assumption or did I do something wrong?


Files

clipboard-202304061011-zsmsr.png (41.2 KB) clipboard-202304061011-zsmsr.png Mark Hansen, 2023-04-06 10:11
S3ChunkedStream.java (1.79 KB) S3ChunkedStream.java Mark Hansen, 2023-04-06 13:01
Actions #1

Updated by Michael Kay 11 months ago

I can't see any obvious reason for this.

Is it possible for you to look at a heap dump and identify what objects are taking up all the space?

Otherwise, if you can supply us some kind of repro (it doesn't have to be a full 2Gb file) we'll try investigating it at this end.

Actions #2

Updated by Mark Hansen 11 months ago

When I created a separate test project with the same transformation logic, I noticed that it worked.

The only difference is that in the test project the XML is read from a FileInputstream and in the other from a LazySequenceInputStream.

I have attached the LazySequenceInputStream Class which we use.

Do you have any clue or is it out of scope?

Actions #3

Updated by Michael Kay 11 months ago

I think you need to take a look at the memory behavior of that LazySequenceInputStream. I can't really assess it: it's possible, for example, that getContentLength() or getUserMetaData() reads the whole file into memory.

Actions #4

Updated by Michael Kay 10 months ago

  • Status changed from New to AwaitingInfo
Actions #5

Updated by Michael Kay 9 months ago

  • Status changed from AwaitingInfo to Closed

I'm closing this because we asked for more information and received no response. Please feel free to reopen it, or raise a fresh issue, if you need further help.

Please register to edit this issue

Also available in: Atom PDF