Project

Profile

Help

Basic Streaming Question - EE Evaluation

Added by Manfred Maier over 6 years ago

Dear all,

I evaluate the EE edition currently, and have some questions. I hope you can verify/correct my assumptions.

Scenario: Process large XML Documents (up to 5 GB) with XSLT. Idea is to take advantage of the streaming features.

First, this is the JAVA code I currently use for testing:

		EnterpriseConfiguration conf = new EnterpriseConfiguration();
        conf.setConfigurationProperty(FeatureKeys.LICENSE_FILE_LOCATION, "M:\\workspace\\cau\\PoC_MM_2\\libs\\saxon-license.lic");
        conf.setConfigurationProperty(FeatureKeys.STREAMABILITY, "standard");
        //conf.setConfigurationProperty(FeatureKeys.TREE_MODEL_NAME, "tinyTreeCondensed");
        log.info(conf.getEditionCode());


        Processor entProc = new Processor(conf); // Should be created only once
        log.info(entProc.getUnderlyingConfiguration().getEditionCode()); // Shall be EE
        XsltCompiler comp = entProc.newXsltCompiler();
        XsltExecutable exp = comp.compile(new StreamSource(new FileInputStream("C:\\Temp\\dummyCleared.xslt")));
        Xslt30Transformer xslt30Transformer = exp.load30(); // Should be created for each transformation
        FileInputStream fis = new FileInputStream("C:\\Temp\\dummyCleared.xml");
        StreamSource ss = new StreamSource(fis);
        FileOutputStream fos = new FileOutputStream("C:\\Temp\\resultCleared.txt");
        Serializer out = entProc.newSerializer(fos);

        xslt30Transformer.applyTemplates(ss, out);
        xslt30Transformer.getUnderlyingController().clearDocumentPool();
        out.close();
        fos.close();
        fis.close();

The test XML:

			
			
			
				
					
						
							XYZ001375-2030-00-0002-1009600
							XYZ001375-2030-00-0002-1009600
						
					
				
				
					
						
							loc3245
							
						
					
				
			

And this is a basic test XSLT:

			
			
				
				

				
					
						
					
				

				
					
						
					
				
			

Now my questions:

  1. Even if it seems to stream the content, a 1,3GB file is still using 3.5 to 4 GB of RAM - is this normal?
  2. After usage, the memory is not cleared - the process is not freeing it. Am I missing something in the code?
  3. Does the overall approach seem to be feasible? I would have expected less memory consumption - am I wrong there?

Thanks a lot and warmest regards, Manfred


Replies (5)

Please register to reply

RE: Basic Streaming Question - EE Evaluation - Added by Manfred Maier over 6 years ago

Edit:

This is the output in case of a 2,3GB test file:

External XSLT processing started... Saxon-EE 9.8.0.11J from Saxonica Java version 1.8.0_91 Using license serial number V006824 Stylesheet compilation time: 1.973265s (1973.265ms) Processing file:/c:/Temp/dummyCleared224.xml Using parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser Building tree for file:/c:/Temp/dummyCleared224.xml using class net.sf.saxon.tree.tiny.TinyBuilder Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

RE: Basic Streaming Question - EE Evaluation - Added by Michael Kay over 6 years ago

You've declared mode "s" as streamable, but you're processing the source document using the unnamed mode (which by default isn't streamable).

The -t output tells you that a tree is being built, which means it's not streaming. (The hard bit can be working out why...)

RE: Basic Streaming Question - EE Evaluation - Added by Manfred Maier over 6 years ago

Thanks Michael for your fast feedback.

In the meantime I did some changes. I can now work through it with a proper XSLT:




	
	

	
		

			
		
	

	
		

			
		
	


Another question concerning the overall handling:

I did only find a method to get a file/IO stream as input for the transformation. I would like to use a dataStream from a database, like such an InputStream:

InputStream contentStream = resultSet.getBinaryStream("ABLOBCOLUMN");

Can this be achieved?

Thanks and best, Manfred

RE: Basic Streaming Question - EE Evaluation - Added by Michael Kay over 6 years ago

Sure, the Saxon API expects a StreamSource or SAXSource. You can create either by wrapping an InputStream, which can be any kind of InputStream.

RE: Basic Streaming Question - EE Evaluation - Added by Manfred Maier over 6 years ago

Thanks Michael... seems this is not my day ;). I just saw it, will try it.

Thanks a lot.

    (1-5/5)

    Please register to reply