Problem with EntityResolver and document()
Added by Nicolas Houillon over 11 years ago
Hi,
I encountered a really weird problem when using an entity resolver with the parser used to read files loaded with the document() function.
I use document() to load a bunch of files that i want to merge, which worked fine until the url in their doctype declaration ceased to work. As i can't change the original xml files i wrote an entity resolver and supplied it to the parser used by saxon with @System.setProperty("javax.xml.parsers.SAXParserFactory", "org.test.TempSAXParserFactory");@.
Now, the weird part is that my entity resolver is called on half the document() calls, while the other half fail as if it wasn't there at all (first call works, second fails, third works, fourth fails, etc...).
I made an example to illustrate the problem, that i'm attaching to the post, you can run it with @mvn test@.
Expected result would be :
================================== Source document ================================== xml/1.xml xml/2.xml xml/3.xml xml/4.xml xml/5.xml xml/6.xml ================================== Starting transform ================================== Resolving with TempEntityResolver : name : null publicId : null baseURI : file:/home/houillon/workspace/saxon-test/xml/1.xml systemId : 1 systemId match, using resource dtd Resolving with TempEntityResolver : name : null publicId : null baseURI : file:/home/houillon/workspace/saxon-test/xml/2.xml systemId : 2 systemId match, using resource dtd Resolving with TempEntityResolver : name : null publicId : null baseURI : file:/home/houillon/workspace/saxon-test/xml/3.xml systemId : 3 systemId match, using resource dtd Resolving with TempEntityResolver : name : null publicId : null baseURI : file:/home/houillon/workspace/saxon-test/xml/4.xml systemId : 4 systemId match, using resource dtd Resolving with TempEntityResolver : name : null publicId : null baseURI : file:/home/houillon/workspace/saxon-test/xml/5.xml systemId : 5 systemId match, using resource dtd Resolving with TempEntityResolver : name : null publicId : null baseURI : file:/home/houillon/workspace/saxon-test/xml/6.xml systemId : 6 ================================== Transform ended ================================== ================================== Result document ================================== 1 2 3 4 5 6
But i get :
================================== Source document ================================== xml/1.xml xml/2.xml xml/3.xml xml/4.xml xml/5.xml xml/6.xml ================================== Starting transform ================================== Resolving with TempEntityResolver : name : null publicId : null baseURI : file:/home/houillon/workspace/saxon-test/xml/1.xml systemId : 1 systemId match, using resource dtd Recoverable error on line 7 of test.xsl: FODC0002: I/O error reported by XML parser processing file:/home/houillon/workspace/saxon-test/xml/2.xml: /home/houillon/workspace/saxon-test/xml/2 (No such file or directory) Resolving with TempEntityResolver : name : null publicId : null baseURI : file:/home/houillon/workspace/saxon-test/xml/3.xml systemId : 3 systemId match, using resource dtd Recoverable error on line 7 of test.xsl: FODC0002: I/O error reported by XML parser processing file:/home/houillon/workspace/saxon-test/xml/4.xml: /home/houillon/workspace/saxon-test/xml/4 (No such file or directory) Resolving with TempEntityResolver : name : null publicId : null baseURI : file:/home/houillon/workspace/saxon-test/xml/5.xml systemId : 5 systemId match, using resource dtd Recoverable error on line 7 of test.xsl: FODC0002: I/O error reported by XML parser processing file:/home/houillon/workspace/saxon-test/xml/6.xml: /home/houillon/workspace/saxon-test/xml/6 (No such file or directory) ================================== Transform ended ================================== ================================== Result document ================================== 1 3 5
I tried with saxon-HE versions 9.4.0.7 and 9.5.0.2 from maven repository, with similar results.
saxon-test.zip (5.47 KB) saxon-test.zip |
Replies (11)
Please register to reply
RE: Problem with EntityResolver and document() - Added by O'Neil Delpratt over 11 years ago
Hi Nicolas,
Thanks for supplying the test code. So far I have managed re-produce the problem you reported. I will now investigate further what is happening.
RE: Problem with EntityResolver and document() - Added by O'Neil Delpratt over 11 years ago
It has turn out that I am getting a different error. Is the DTD correct? I am getting errors like the following:
Error on line 1 column 1 SXXP0003: Error reported by XML parser: The markup declarations contained or pointed to by the document type declaration must be well-formed. Error Error reported by XML parser processing file:/home/ond1/work/test/nicolas/xml/5.xml: The markup declarations contained or pointed to by the document type declaration must be well-formed. Error on line 2 column 3 of test.dtd: SXXP0003: Error reported by XML parser: The markup in the document preceding the root element must be well-formed.
The cause of this is the doctype declaration in the xml documents
RE: Problem with EntityResolver and document() - Added by O'Neil Delpratt over 11 years ago
After some initial problem with setting the working directory I had to replace the following line:
InputSource is = new InputSource(this.getClass().getResourceAsStream("test.dtd"));
with:
InputSource is = new InputSource(new StringReader("test.dtd"));
This was because the test.dtd was not being picked up.
RE: Problem with EntityResolver and document() - Added by O'Neil Delpratt over 11 years ago
Also in the xml document I would except the following:
Instead of what's currently there:
RE: Problem with EntityResolver and document() - Added by Nicolas Houillon over 11 years ago
I changed the doctype declaration in the xml files to be sure which ones were failing.
As you can see in the log i pasted, there is @systemId : 1@ in places where it works and @/home/houillon/workspace/saxon-test/xml/2@ when it doesn't.
To make it work, in the EntityResolver i look for systemId in the form of a number, so changing the numbers to test.dtd in the wml files would make it fail if you don't also change the entity resolver.
RE: Problem with EntityResolver and document() - Added by Nicolas Houillon over 11 years ago
A more appropriate (and closer to what is really happening in my program) doctype line for the xml files would be with an url that doesn't point to anything, and replace the @systemId.matches("\d")@ part in the entity resolver with @systemId.equals(thatURL)@.
RE: Problem with EntityResolver and document() - Added by O'Neil Delpratt over 11 years ago
Ok. But I am still getting the following error when I execute the stylesheet:
Error on line 1 column 1 SXXP0003: Error reported by XML parser: The markup declarations contained or pointed to by the document type declaration must be well-formed.
Have I got the correct DTD?
RE: Problem with EntityResolver and document() - Added by O'Neil Delpratt over 11 years ago
Ok. I have finally managed to reproduced the problem reported.
RE: Problem with EntityResolver and document() - Added by O'Neil Delpratt over 11 years ago
Hi,
I have finally managed to track done why we are getting this intermitted behaviour. This is indeed a bug in Saxon. There seems to be some incorrect setting of a reuseParser flag in the @Sender@ class, which resets the entityResolver to null after each iteration of parsing the xml files. I still need to step through the logic, but it seems like we are switching back to the default resolver, after which we set @reuseParser=false@. This allows your implementation of the entityResolver to be picked up again.
I have now created a bug issue which you can follow to keep up-to-date on when the fix will be made (see: https://saxonica.plan.io/issues/1793). I am sorry I cannot think of a work-around without changing the xml files themselves, which I know you do not have control over. However, I am sure we will resolve this bug shortly, available for the next maintenance release or for building yourself.
RE: Problem with EntityResolver and document() - Added by Nicolas Houillon over 11 years ago
Thank you.
RE: Problem with EntityResolver and document() - Added by O'Neil Delpratt over 11 years ago
Just to state this bug issue is now resolved and committed to subversion in the 9.4 and 9.5 branches. It will be available in the next maintenance releases, respectively. The fix was to maintain the user EntityResolver when the sourceParser is reused.
Please register to reply