Project

Profile

Help

Entity size limit in libsaxon-HEC-windows/ SaxonC / GraalVM

Added by Andreas Oetjen 12 months ago

Hi, I'm using the command\Transform.c sample from libsaxon-HEC-windows-v12.1, and ran into the following problem:

org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; JAXP00010001: 
The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.

(To reproduce see attachment - this is not the real use case, but shows the problem)

How can I disable the secure processing or increase the maximum entity expansion size? I tried to set the environment variable:

set JAVA_TOOL_OPTIONS=-Djdk.xml.entityExpansionLimit=100000000 -Djdk.xml.maxGeneralEntitySizeLimit=100000000

(and several other values), but it seems they are not promoted to the graalvm inside.

Maybe you can give me a short hint on what to do.

Kind regards

Andreas


Replies (4)

Please register to reply

RE: Entity size limit in libsaxon-HEC-windows/ SaxonC / GraalVM - Added by Martin Honnen 12 months ago

For the current release of 12.1 I manage to compile your file by setting a parser property e.g.

from saxonche import *

with PySaxonProcessor(license=False) as proc:
    print(proc.version)

    proc.set_configuration_property('http://saxon.sf.net/feature/parserProperty?uri=http%3A//www.oracle.com/xml/jaxp/properties/entityExpansionLimit', '128000')

    xslt_processor = proc.new_xslt30_processor()

    xdm_doc = proc.parse_xml(xml_file_name='C:/Users/marti/Downloads/entity-expansion-size.xsl')

    xslt_executable = xslt_processor.compile_stylesheet(stylesheet_node=xdm_doc)

RE: Entity size limit in libsaxon-HEC-windows/ SaxonC / GraalVM - Added by Andreas Oetjen 12 months ago

Thanks @Martin Honnen. I finally got it to run, but firstly ran into issues.

The following shows the code that works:

SaxonProcessor* processor = new SaxonProcessor(false);
processor->setConfigurationProperty("http://saxon.sf.net/feature/parserProperty?uri=http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit", "0");

Xslt30Processor* trans = processor->newXslt30Processor();

// This works:
XdmNode *res = processor->parseXmlFromFile(xslFile.c_str());
XsltExecutable* exe = trans->compileFromXdmNode(res);
exe->transformFileToFile(sourceFile.c_str(), outputFile.c_str());

What does not work is if you use compileFromFile directly, instead of parseXmlFromFile:

XsltExecutable* exe = trans->compileFromFile(xslFile.c_str());
exe->transformFileToFile(sourceFile.c_str(), outputFile.c_str());

I have a strange feeling that this failure is caused because the configuration property seems be be set too late, e.g. after the entity expansion. To show this, try the following:

call "setConfigurationProperty" with an "invalid" property:

processor->setConfigurationProperty("http://saxon.sf.net/feature/parserProperty?uri=gurkensalat", "0");

Case 1: use an "simple" xslFile with no large entity expansion --> Error

Selected XML parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser
    does not recognize the property gurkensalat

Case 2: use an xslFile with a large entity expansion --> Error

SaxonApiException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; JAXP00010001: 
    The parser has encountered more than "64000" entity expansions in this document; 
    this is the limit imposed by the JDK.

To me, this two different errors clearly show that the entity expansion seems to be performed before any (in this case invaild) configuration property is set to the SAXParser.

RE: Entity size limit in libsaxon-HEC-windows/ SaxonC / GraalVM - Added by Michael Kay 12 months ago

No, I think it's because properties set using http://saxon.sf.net/feature/parserProperty affect how source documents are parsed, not how stylesheets are parsed. So it will affect parseXmlFromFile (because we don't know we're dealing with a stylesheet here), but not compileFromFile, because when we know we're parsing a stylesheet, we configure the parser for that purpose.

    (1-4/4)

    Please register to reply