Project

Profile

Help

Bug #2631

closed

OutOfMemory running large Schematron file

Added by Alexander Henket about 8 years ago. Updated about 8 years ago.

Status:
Closed
Priority:
Low
Assignee:
Category:
Performance
Sprint/Milestone:
Start date:
2016-02-20
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
9.6
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

Hi,

I run Saxon 9.6.0.7J through Oxygen 17.1. Regardless of the edition of Saxon, I run out of memory at various stages. The first is in looking for the phases that my schematron has. The second in running it. I asked the oXygen guys and they felt there was nothing more they could do and referred me you.

Note that we generate schematron. Not all schematrons are as huge as you'll find attached in the reproduction zip. Most run without a hitch.

I'd like to know if there's anything you could do to keep the memory foot print down, or maybe that we can create more efficient paths for example.

Environment: MacBook Pro early 2013, 2.6Ghz i7, 16GB RAM, with oXygen running on max 4GB RAM.

Thanks in advance.


Files

schematron_performance.zip (2.79 MB) schematron_performance.zip Alexander Henket, 2016-02-20 07:30
Actions #1

Updated by Alexander Henket about 8 years ago

Note: I run out of memory when I choose phase #ALL, not when I choose any other (smaller) phase. The problem is that #ALL contains things that none of the other phases have and there are just too many phases to try one by one.

Actions #2

Updated by Michael Kay about 8 years ago

  • Subject changed from OutOfMemory to OutOfMemory running large Schematron file

First, please note that we aren't Schematron experts. Our general rule is that we don't offer support where third-party products are involved, unless they are part of our standard test platform. Where things are reasonably simple open source applications, we're prepared to relax this rule a little, but you need to provide us with detailed instructions as to what we need to download and how you want us to run it. I think there is more than one Schematron processor out there and you haven't even told us which one you are running. Presumably it is one of the processors that converts schematron files to XSLT and then processes the XSLT using Saxon.

Secondly, one of your Schematron files is 45Mb in size. That's a big program in any language: it's about the same size as the source code of Saxon, all in one module. What's the size of the XSLT code that this produces? I'm afraid I'm not in the least surprised that it blows up: Saxon is not designed to handle such enormous stylesheets. I've no idea what the limits are, but I think you need to re-think your design.

Actions #3

Updated by Michael Kay about 8 years ago

  • Status changed from New to AwaitingInfo
Actions #4

Updated by Alexander Henket about 8 years ago

Your reply got me thinking, so I ran the Schematron.com standard conversion from SCH to SVRL to get the stylesheet. To answer that question: the XSLT is 145MB.

I then ran the thing outside of oxygen on the command line using the Saxon9EE.jar 9.6.0.7J from oXygen. Took a while but sure enough 2MB worth of SVRL came out. No out of memory, but rather useless output as I now need to run every single path with problems against the base instance again to get me back to a problematic fragment.

(java -Xmx4096m -jar /Applications/oxygen/lib/saxon9ee.jar -o:svrlresult.xml xml-JGZ/REPC_EX902120NL_03.xml schematron_svrl/jgz-versturenDossieroverdrachtbericht-02.xsl)

The second thing I'm working on is a small rewrite of the schematron-engine we have created to build these SCH files so new phases become available to facilitate the process in oXygen and so I don't 'have' to go to #ALL.

Lastly: I've pointed the oXygen guys to this ticket and asked them to re-evaluate what could be done.

So in summary: I think I have enough to work with to make things more manageable. The goal is to stay inside oXygen as disconnected SVRL is not a feasible solution.

It would seem this ticket here may be closed. If the oXygen-guys feel it should be reopened based on their renewed investigation I'm sure we'll get back to it. Thanks.

Actions #5

Updated by Michael Kay about 8 years ago

  • Status changed from AwaitingInfo to Closed
  • Assignee set to Michael Kay

I'm closing this. Running a 145Mb stylesheet is beyond Saxon's capabilities.

Actions #6

Updated by Alexander Henket about 8 years ago

Just for the record: running that XSL 145MB command line actually does work more or less reasonably well. It's oXygen as intermediary party that chokes. Saxon maybe wasn't designed for it, but it does work.

Please register to edit this issue

Also available in: Atom PDF