Project

Profile

Help

Problem with EntityResolver and document()

Added by Nicolas Houillon almost 11 years ago

Hi,

I encountered a really weird problem when using an entity resolver with the parser used to read files loaded with the document() function.

I use document() to load a bunch of files that i want to merge, which worked fine until the url in their doctype declaration ceased to work. As i can't change the original xml files i wrote an entity resolver and supplied it to the parser used by saxon with @System.setProperty("javax.xml.parsers.SAXParserFactory", "org.test.TempSAXParserFactory");@.

Now, the weird part is that my entity resolver is called on half the document() calls, while the other half fail as if it wasn't there at all (first call works, second fails, third works, fourth fails, etc...).

I made an example to illustrate the problem, that i'm attaching to the post, you can run it with @mvn test@.

Expected result would be :

==================================
Source document
==================================


   xml/1.xml
   xml/2.xml
   xml/3.xml
   xml/4.xml
   xml/5.xml
   xml/6.xml

==================================
Starting transform
==================================

Resolving with TempEntityResolver :
name : null
publicId : null
baseURI : file:/home/houillon/workspace/saxon-test/xml/1.xml
systemId : 1

systemId match, using resource dtd

Resolving with TempEntityResolver :
name : null
publicId : null
baseURI : file:/home/houillon/workspace/saxon-test/xml/2.xml
systemId : 2

systemId match, using resource dtd

Resolving with TempEntityResolver :
name : null
publicId : null
baseURI : file:/home/houillon/workspace/saxon-test/xml/3.xml
systemId : 3

systemId match, using resource dtd

Resolving with TempEntityResolver :
name : null
publicId : null
baseURI : file:/home/houillon/workspace/saxon-test/xml/4.xml
systemId : 4

systemId match, using resource dtd

Resolving with TempEntityResolver :
name : null
publicId : null
baseURI : file:/home/houillon/workspace/saxon-test/xml/5.xml
systemId : 5

systemId match, using resource dtd

Resolving with TempEntityResolver :
name : null
publicId : null
baseURI : file:/home/houillon/workspace/saxon-test/xml/6.xml
systemId : 6
==================================
Transform ended
==================================
==================================
Result document
==================================


   1
   2
   3
   4
   5
   6

But i get :

==================================
Source document
==================================


   xml/1.xml
   xml/2.xml
   xml/3.xml
   xml/4.xml
   xml/5.xml
   xml/6.xml

==================================
Starting transform
==================================

Resolving with TempEntityResolver :
name : null
publicId : null
baseURI : file:/home/houillon/workspace/saxon-test/xml/1.xml
systemId : 1

systemId match, using resource dtd

Recoverable error on line 7 of test.xsl:
  FODC0002: I/O error reported by XML parser processing
  file:/home/houillon/workspace/saxon-test/xml/2.xml:
  /home/houillon/workspace/saxon-test/xml/2 (No such file or directory)

Resolving with TempEntityResolver :
name : null
publicId : null
baseURI : file:/home/houillon/workspace/saxon-test/xml/3.xml
systemId : 3

systemId match, using resource dtd

Recoverable error on line 7 of test.xsl:
  FODC0002: I/O error reported by XML parser processing
  file:/home/houillon/workspace/saxon-test/xml/4.xml:
  /home/houillon/workspace/saxon-test/xml/4 (No such file or directory)

Resolving with TempEntityResolver :
name : null
publicId : null
baseURI : file:/home/houillon/workspace/saxon-test/xml/5.xml
systemId : 5

systemId match, using resource dtd

Recoverable error on line 7 of test.xsl:
  FODC0002: I/O error reported by XML parser processing
  file:/home/houillon/workspace/saxon-test/xml/6.xml:
  /home/houillon/workspace/saxon-test/xml/6 (No such file or directory)
==================================
Transform ended
==================================
==================================
Result document
==================================


   1
   3
   5

I tried with saxon-HE versions 9.4.0.7 and 9.5.0.2 from maven repository, with similar results.


Replies (11)

Please register to reply

RE: Problem with EntityResolver and document() - Added by O'Neil Delpratt almost 11 years ago

Hi Nicolas,

Thanks for supplying the test code. So far I have managed re-produce the problem you reported. I will now investigate further what is happening.

RE: Problem with EntityResolver and document() - Added by O'Neil Delpratt almost 11 years ago

It has turn out that I am getting a different error. Is the DTD correct? I am getting errors like the following:

Error on line 1 column 1 
  SXXP0003: Error reported by XML parser: The markup declarations contained or pointed to by
  the document type declaration must be well-formed.
Error 
  Error reported by XML parser processing file:/home/ond1/work/test/nicolas/xml/5.xml: The
  markup declarations contained or pointed to by the document type declaration must be well-formed.
Error on line 2 column 3 of test.dtd:
  SXXP0003: Error reported by XML parser: The markup in the document preceding the root
  element must be well-formed.

The cause of this is the doctype declaration in the xml documents

RE: Problem with EntityResolver and document() - Added by O'Neil Delpratt almost 11 years ago

After some initial problem with setting the working directory I had to replace the following line:


InputSource is = new InputSource(this.getClass().getResourceAsStream("test.dtd"));

with:


InputSource is = new InputSource(new StringReader("test.dtd"));

This was because the test.dtd was not being picked up.

RE: Problem with EntityResolver and document() - Added by O'Neil Delpratt almost 11 years ago

Also in the xml document I would except the following:



Instead of what's currently there:



RE: Problem with EntityResolver and document() - Added by Nicolas Houillon almost 11 years ago

I changed the doctype declaration in the xml files to be sure which ones were failing.

As you can see in the log i pasted, there is @systemId : 1@ in places where it works and @/home/houillon/workspace/saxon-test/xml/2@ when it doesn't.

To make it work, in the EntityResolver i look for systemId in the form of a number, so changing the numbers to test.dtd in the wml files would make it fail if you don't also change the entity resolver.

RE: Problem with EntityResolver and document() - Added by Nicolas Houillon almost 11 years ago

A more appropriate (and closer to what is really happening in my program) doctype line for the xml files would be with an url that doesn't point to anything, and replace the @systemId.matches("\d")@ part in the entity resolver with @systemId.equals(thatURL)@.

RE: Problem with EntityResolver and document() - Added by O'Neil Delpratt almost 11 years ago

Ok. But I am still getting the following error when I execute the stylesheet:

Error on line 1 column 1 SXXP0003: Error reported by XML parser: The markup declarations contained or pointed to by the document type declaration must be well-formed.

Have I got the correct DTD?

RE: Problem with EntityResolver and document() - Added by O'Neil Delpratt almost 11 years ago

Ok. I have finally managed to reproduced the problem reported.

RE: Problem with EntityResolver and document() - Added by O'Neil Delpratt almost 11 years ago

Hi,

I have finally managed to track done why we are getting this intermitted behaviour. This is indeed a bug in Saxon. There seems to be some incorrect setting of a reuseParser flag in the @Sender@ class, which resets the entityResolver to null after each iteration of parsing the xml files. I still need to step through the logic, but it seems like we are switching back to the default resolver, after which we set @reuseParser=false@. This allows your implementation of the entityResolver to be picked up again.

I have now created a bug issue which you can follow to keep up-to-date on when the fix will be made (see: https://saxonica.plan.io/issues/1793). I am sorry I cannot think of a work-around without changing the xml files themselves, which I know you do not have control over. However, I am sure we will resolve this bug shortly, available for the next maintenance release or for building yourself.

RE: Problem with EntityResolver and document() - Added by O'Neil Delpratt almost 11 years ago

Just to state this bug issue is now resolved and committed to subversion in the 9.4 and 9.5 branches. It will be available in the next maintenance releases, respectively. The fix was to maintain the user EntityResolver when the sourceParser is reused.

    (1-11/11)

    Please register to reply