Forums » Saxon/C Help and Discussions »
Entity size limit in libsaxon-HEC-windows/ SaxonC / GraalVM
Added by Andreas Oetjen over 1 year ago
Hi, I'm using the command\Transform.c sample from libsaxon-HEC-windows-v12.1, and ran into the following problem:
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; JAXP00010001:
The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.
(To reproduce see attachment - this is not the real use case, but shows the problem)
How can I disable the secure processing or increase the maximum entity expansion size? I tried to set the environment variable:
set JAVA_TOOL_OPTIONS=-Djdk.xml.entityExpansionLimit=100000000 -Djdk.xml.maxGeneralEntitySizeLimit=100000000
(and several other values), but it seems they are not promoted to the graalvm inside.
Maybe you can give me a short hint on what to do.
Kind regards
Andreas
Replies (4)
Please register to reply
RE: Entity size limit in libsaxon-HEC-windows/ SaxonC / GraalVM - Added by Martin Honnen over 1 year ago
For the current release of 12.1 I manage to compile your file by setting a parser property e.g.
from saxonche import *
with PySaxonProcessor(license=False) as proc:
print(proc.version)
proc.set_configuration_property('http://saxon.sf.net/feature/parserProperty?uri=http%3A//www.oracle.com/xml/jaxp/properties/entityExpansionLimit', '128000')
xslt_processor = proc.new_xslt30_processor()
xdm_doc = proc.parse_xml(xml_file_name='C:/Users/marti/Downloads/entity-expansion-size.xsl')
xslt_executable = xslt_processor.compile_stylesheet(stylesheet_node=xdm_doc)
RE: Entity size limit in libsaxon-HEC-windows/ SaxonC / GraalVM - Added by Michael Kay over 1 year ago
Note also that https://saxonica.plan.io/issues/5885 might be relevant.
RE: Entity size limit in libsaxon-HEC-windows/ SaxonC / GraalVM - Added by Andreas Oetjen over 1 year ago
Thanks @Martin Honnen. I finally got it to run, but firstly ran into issues.
The following shows the code that works:¶
SaxonProcessor* processor = new SaxonProcessor(false);
processor->setConfigurationProperty("http://saxon.sf.net/feature/parserProperty?uri=http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit", "0");
Xslt30Processor* trans = processor->newXslt30Processor();
// This works:
XdmNode *res = processor->parseXmlFromFile(xslFile.c_str());
XsltExecutable* exe = trans->compileFromXdmNode(res);
exe->transformFileToFile(sourceFile.c_str(), outputFile.c_str());
What does not work is if you use compileFromFile
directly, instead of parseXmlFromFile
:¶
XsltExecutable* exe = trans->compileFromFile(xslFile.c_str());
exe->transformFileToFile(sourceFile.c_str(), outputFile.c_str());
I have a strange feeling that this failure is caused because the configuration property seems be be set too late, e.g. after the entity expansion. To show this, try the following:
call "setConfigurationProperty" with an "invalid" property:
processor->setConfigurationProperty("http://saxon.sf.net/feature/parserProperty?uri=gurkensalat", "0");
Case 1: use an "simple" xslFile with no large entity expansion --> Error¶
Selected XML parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser
does not recognize the property gurkensalat
Case 2: use an xslFile with a large entity expansion --> Error¶
SaxonApiException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; JAXP00010001:
The parser has encountered more than "64000" entity expansions in this document;
this is the limit imposed by the JDK.
To me, this two different errors clearly show that the entity expansion seems to be performed before any (in this case invaild) configuration property is set to the SAXParser.
RE: Entity size limit in libsaxon-HEC-windows/ SaxonC / GraalVM - Added by Michael Kay over 1 year ago
No, I think it's because properties set using http://saxon.sf.net/feature/parserProperty affect how source documents are parsed, not how stylesheets are parsed. So it will affect parseXmlFromFile (because we don't know we're dealing with a stylesheet here), but not compileFromFile
, because when we know we're parsing a stylesheet, we configure the parser for that purpose.
Please register to reply