I don't understand why Saxon is looking for xhtml-inlpres1.mod in XSLT stylesheet's directory
Added by Kai Weber over 11 years ago
I want to run a series of transformations on some XHTML files from a shell script, I'm calling Saxon HE (tried latest 9.4 and new 9.5) with the line
java -classpath ./bin/saxon9he.jar net.sf.saxon.Transform -s:./temp/META-INF/container.xml -xsl:./lib/insertPageAnchors.xsl -o:./temp/temp.xml -opt:10 -dtd:off outpath=transformed/
However, for every file I get the following error:
FODC0002: I/O error reported by XML parser processing file: [...]/cov01.html: [...]/xhtml-inlpres-1.mod (No such file or directory)
Saxon is looking for this xhtml-inlpres-1.mod file in the same directory as my XSLT stylesheet is located in.
As this file is not called explicitly in my stylesheet, it must be implicitly handled by Saxon when it is retrieving the DTD. My HTML files have the following declaration:
In my XSL transformation I want to copy all the elements, attributes and processing-instructions, just adding id's to certain -elements. The resulting files however are only containing the xml-declaration, no further lines. The strange thing is: If I use the Saxon HE 9.4 that is built into my oXygen XML editor, everything works as expected (well, nearly: the resulting XHTML files have a version="-//W3C//DTD XHTML 1.1//EN" attribute on the root element, no matter if I the output mode is "xml" or "xhtml", which I would rather like to avoid, as it gives me a validation error): In oXygen I get the desired output and no error messages from the saxon processor.
I've seen that xhtml-inlpres-1.mod is included in the saxon9he.jar. Am I missing something on the classpath, or what might be the reason, that saxon is looking for this file in the wrong place? Any ideas?
Best regards, Kai Weber
Replies (3)
Please register to reply
RE: I don't understand why Saxon is looking for xhtml-inlpres1.mod in XSLT stylesheet's directory - Added by Michael Kay over 11 years ago
I would need to see the files in use to get to the bottom of this. I can't think of any obvious reason.
You appear to be processing an HTML file using the doc function, is that right? I wonder if there's some URI redirection going on that is confusing things.
Saxon of course doesn't retrieve the external entities referenced by the DTD itself; this is done by the XML parser, but it does it using an EntityResolver that Saxon supplies. It's the XML parser, however, that is responsible for determining the base URI (that is, the location in which to look for the file). Ideally it would be the parser's job to hold copies of well-known entities to avoid fetching them from the (unresponsive) W3C site, rather than this being done at the Saxon level. What Saxon does it to supply an EntityResolver which the XML parser uses to resolve references. The most common reason for this going wrong is that a file is parsed whose base URI is unknown.
The presence of a public ID in the version attribute of the root element looks very odd, and I would be tempted to investigate it; it could be a symptom of some deeper problem.
RE: I don't understand why Saxon is looking for xhtml-inlpres1.mod in XSLT stylesheet's directory - Added by Kai Weber over 11 years ago
Yes, I am using several calls to the document function. I want to process epub files. The shell script is supposed to take care of unzipping and zipping, the xslt for the transform. As quite a lot of single files are read and included in the process, I though it might be a good idea to store the needed file paths in variables, using no relative but absolute paths, that I've built with base-uri(). Anyway, the document() calls are kind of nested: the second document() call is made in from a template that is itself processed from a document() call.
I've taken your hint about the parser and integrated xercesImpl.jar explicitly on the classpath and setting it as DOM and SAX parser with -D. But still I get the same error message. So, with my multiple document() calls it really must be the case that a file is parsed whose base URI is unknown, as you say. How can I make the base-uri's all known to the EntityResolver?
I've uploaded an archive with an example of my process and attached it.
The java call to saxon is in "pagelist.sh". I put saxon9he.jar and xercesImpl.jar in the /bin directory (not included in the example archive). A sample set of files I want to process is in the /temp folder.
By the way: originally all my document()-calls were with just one argument. I've later included the "/" as a second argument, but that didn't change the behavior - error message is the same with both versions. Anyway, with my file paths being absolute, I thought the "/" as second argument wouldn't be necessary anyway.
And another note: I've not only tried this as a shell script on Ubuntu linux, but also with a batch script version on a Windows 7 installation. Same error message there.
Best regards, Kai Weber
example.zip (2.86 MB) example.zip |
RE: I don't understand why Saxon is looking for xhtml-inlpres1.mod in XSLT stylesheet's directory - Added by Michael Kay over 11 years ago
Thanks for your efforts in putting this evidence together.
I'm afraid it's a simple typo in Saxon's internal catalog of internally-held DTD entities. It's mapping a particular public ID to the location xhtml11/xhtml-inlpres1.mod when it should be w3c/xhtml11/xhtml-inlpres1.mod.
I haven't yet established how this bug got through our tests of this feature.
We'll fix this in our next maintenance release, which is imminent. I can't suggest any easy circumvention - you can disable Saxon's internal DTD cache, but it will then try fetching the DTDs from the W3C web site, which will give you a different kind of problem.
I'll raise the bug on the "Issues" side of this tracker, and you can follow progress there.
Please register to reply