Basic Streaming Question - EE Evaluation
Added by Manfred Maier over 6 years ago
Dear all,
I evaluate the EE edition currently, and have some questions. I hope you can verify/correct my assumptions.
Scenario: Process large XML Documents (up to 5 GB) with XSLT. Idea is to take advantage of the streaming features.
First, this is the JAVA code I currently use for testing:
EnterpriseConfiguration conf = new EnterpriseConfiguration();
conf.setConfigurationProperty(FeatureKeys.LICENSE_FILE_LOCATION, "M:\\workspace\\cau\\PoC_MM_2\\libs\\saxon-license.lic");
conf.setConfigurationProperty(FeatureKeys.STREAMABILITY, "standard");
//conf.setConfigurationProperty(FeatureKeys.TREE_MODEL_NAME, "tinyTreeCondensed");
log.info(conf.getEditionCode());
Processor entProc = new Processor(conf); // Should be created only once
log.info(entProc.getUnderlyingConfiguration().getEditionCode()); // Shall be EE
XsltCompiler comp = entProc.newXsltCompiler();
XsltExecutable exp = comp.compile(new StreamSource(new FileInputStream("C:\\Temp\\dummyCleared.xslt")));
Xslt30Transformer xslt30Transformer = exp.load30(); // Should be created for each transformation
FileInputStream fis = new FileInputStream("C:\\Temp\\dummyCleared.xml");
StreamSource ss = new StreamSource(fis);
FileOutputStream fos = new FileOutputStream("C:\\Temp\\resultCleared.txt");
Serializer out = entProc.newSerializer(fos);
xslt30Transformer.applyTemplates(ss, out);
xslt30Transformer.getUnderlyingController().clearDocumentPool();
out.close();
fos.close();
fis.close();
The test XML:
XYZ001375-2030-00-0002-1009600 XYZ001375-2030-00-0002-1009600 loc3245
And this is a basic test XSLT:
Now my questions:
- Even if it seems to stream the content, a 1,3GB file is still using 3.5 to 4 GB of RAM - is this normal?
- After usage, the memory is not cleared - the process is not freeing it. Am I missing something in the code?
- Does the overall approach seem to be feasible? I would have expected less memory consumption - am I wrong there?
Thanks a lot and warmest regards, Manfred
Replies (5)
Please register to reply
RE: Basic Streaming Question - EE Evaluation - Added by Manfred Maier over 6 years ago
Edit:
This is the output in case of a 2,3GB test file:
External XSLT processing started... Saxon-EE 9.8.0.11J from Saxonica Java version 1.8.0_91 Using license serial number V006824 Stylesheet compilation time: 1.973265s (1973.265ms) Processing file:/c:/Temp/dummyCleared224.xml Using parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser Building tree for file:/c:/Temp/dummyCleared224.xml using class net.sf.saxon.tree.tiny.TinyBuilder Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
RE: Basic Streaming Question - EE Evaluation - Added by Michael Kay over 6 years ago
You've declared mode "s" as streamable, but you're processing the source document using the unnamed mode (which by default isn't streamable).
The -t output tells you that a tree is being built, which means it's not streaming. (The hard bit can be working out why...)
RE: Basic Streaming Question - EE Evaluation - Added by Manfred Maier over 6 years ago
Thanks Michael for your fast feedback.
In the meantime I did some changes. I can now work through it with a proper XSLT:
Another question concerning the overall handling:
I did only find a method to get a file/IO stream as input for the transformation. I would like to use a dataStream from a database, like such an InputStream:
InputStream contentStream = resultSet.getBinaryStream("ABLOBCOLUMN");
Can this be achieved?
Thanks and best, Manfred
RE: Basic Streaming Question - EE Evaluation - Added by Michael Kay over 6 years ago
Sure, the Saxon API expects a StreamSource or SAXSource. You can create either by wrapping an InputStream, which can be any kind of InputStream.
RE: Basic Streaming Question - EE Evaluation - Added by Manfred Maier over 6 years ago
Thanks Michael... seems this is not my day ;). I just saw it, will try it.
Thanks a lot.
Please register to reply