Project

Profile

Help

Ant schemavalidate causes Memory Exception

Added by Micha H over 2 years ago

Hi,

I use Ant/Saxon-EE environment to transform and validate resulting files. What works fine for few hundred files, causes memory exception in validation step for thousands of file. (Transformations run smoothly no matter the amount of files.)

I am not sure if memory allocation is the solution. It seems the more files the memory I will need. In future I expect to transform and validate tens of thousands of files in a single run. Since I'm not very proficient in Java nor Ant so I might be missing something...

Error message:

BUILD FAILED C:[...]\build.xml:64: java.lang.OutOfMemoryError: Compressed class space at java.base/java.lang.ClassLoader.defineClass1(Native Method) at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1016) at org.apache.tools.ant.AntClassLoader.defineClassFromData(AntClassLoader.java:1153) at org.apache.tools.ant.AntClassLoader.getClassFromStream(AntClassLoader.java:1321) at org.apache.tools.ant.AntClassLoader.findClassInComponents(AntClassLoader.java:1373) at org.apache.tools.ant.AntClassLoader.findClass(AntClassLoader.java:1338) at org.apache.tools.ant.AntClassLoader.loadClass(AntClassLoader.java:1093) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521) at net.sf.saxon.tree.tiny.TinyBuilder.open(TinyBuilder.java:125) at net.sf.saxon.event.ReceivingContentHandler.startDocument(ReceivingContentHandler.java:249) at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startDocument(AbstractSAXParser.java:293) at java.xml/com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.startDocument(XMLDTDValidator.java:622) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.startEntity(XMLDocumentScannerImpl.java:545) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLVersionDetector.startDocumentParsing(XMLVersionDetector.java:136) at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:874) at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824) at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1216) at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635) at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:439) at net.sf.saxon.event.Sender.send(Sender.java:168) at net.sf.saxon.Configuration.buildDocumentTree(Configuration.java:4202) at net.sf.saxon.regex.charclass.Categories.build(Categories.java:124) at net.sf.saxon.regex.charclass.Categories.getCategory(Categories.java:197) at net.sf.saxon.regex.charclass.Categories.(Categories.java:174) at net.sf.saxon.regex.RECompiler.escape(RECompiler.java:350) at net.sf.saxon.regex.RECompiler.parseTerminal(RECompiler.java:930) at net.sf.saxon.regex.RECompiler.piece(RECompiler.java:971) at net.sf.saxon.regex.RECompiler.parseBranch(RECompiler.java:1091) at net.sf.saxon.regex.RECompiler.parseExpr(RECompiler.java:1141) at net.sf.saxon.regex.RECompiler.compile(RECompiler.java:1274) at net.sf.saxon.regex.ARegularExpression.(ARegularExpression.java:54)

Regarding code lines from Ant build-file:


 
 
 


Thanks for your help.


Replies (5)

Please register to reply

RE: Ant schemavalidate causes Memory Exception - Added by Micha H over 2 years ago

Regarding code lines from Ant build-file:

	<target name="validate-final">
		<schemavalidate failonerror="no" noNamespaceFile="${xsd-file}" classname="com.saxonica.ee.jaxp.ValidatingReader" classpath="${saxon-jar}">
			<fileset dir="${process-dir}Entries" includes="**/*.xml"/>
		</schemavalidate>
	</target>

RE: Ant schemavalidate causes Memory Exception - Added by Michael Kay over 2 years ago

I'm afraid I don't know what Ant does with a call like this. It's a little surprising that it runs out of memory during Categories.getCategory() because that's reading a Saxon datafile (Unicode character categories) that should only need to be read once. This might suggest that Ant is calling Saxon thousands of times, rather than just calling it once to validate thousands of files. I'll need to try and reproduce it under the debugger to see what's going on.

The other problem here is that if Ant is invoking Saxon inefficiently, there's not all that much we can do about it. We have introduced configuration switches in the past so that Saxon can ignore some of the things Ant asks it to do, but that gets very messy.

RE: Ant schemavalidate causes Memory Exception - Added by Michael Kay over 2 years ago

I've taken a look at how the Ant schemavalidate task operates by running it in the debugger and seeing what calls are made on Saxon.

For each source document to be validated, it makes a new call on "new ValidatingReader()", and supplies the source schema by calling setProperty("http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation", fileName)

This means that Saxon is being initialised from scratch for every file processed, and there is no re-use of the compiled schema (let alone resources such as the Unicode character categories).

This is pretty bad news. It's not easy to do anything about it, and I wonder whether it's worth it anyway since Ant is no longer flavour of the month. The only thing we could conceivably do would be to cache schema information across calls in some static cache.

But you're not actually complaining about processing time, your'e complaining about memory. And the fact that each call initialises Saxon from scratch means that it's very unlikely to be Saxon that's accumulating resource in memory, it must be Ant that's doing so.

Your next step should probably be to try and get a heap dump to see what objects are accumulating in memory.

Alternatively, avoid using the schema validate task, and invoke Saxon's validation API directly, either using the command-line interface or calling a Java application that wraps the s9api API. Either way, you'll find it's quite easy to load the schema once and use it to validate multiple files.

RE: Ant schemavalidate causes Memory Exception - Added by Michael Kay over 2 years ago

Hmmm. At the point where it's running out of memory, it's calling net.sf.saxon.regex.charclass.Categories.getCategory(), which is reading an XML file. The method getCategory() should only do anything the first time it's called in a JVM (because it keeps the resulting information in static data). I've checked this by amending my test schema to use a regex that requires character categories, and it's only reading this XML file once. It seems rather surprising that you should run out of memory while reading this file (which isn't especially large) if it's only read once during the whole process. I can't explain that.

What's more, while it's reading the categories file it's also creating a TinyTree and the actual failure occurs when calling new TinyTree(): it looks like this is causing classes to be loaded, which surely can only happen if they aren't already loaded, unless there's some complication involving multiple classLoaders. I've no idea, of course, what the custom Ant ClassLoader is doing.

Strange.

RE: Ant schemavalidate causes Memory Exception - Added by Michael Kay over 2 years ago

Another way you could explore what's going on is to create your own implementation of ValidatingReader that delegates to the Saxon ValidatingReader; this would enable you to trace what calls are being made, and you could also consider caching the Saxon Configuration in static data. Unfortunately the ValidatingReader doesn't expose the Configuration that it uses, but you could get it by reflection.

    (1-5/5)

    Please register to reply