Bug #5561
openmisleading error message on failure to find external DTD
0%
Description
If the source file being transformed by a stylesheet contains a DOCTYPE statement which references a non-existent file, an I/O error is raised. That's fine, but the diagnostic message suggests that it's the source file which doesn't exist, rather than a file it references.
SaxonC-HE 11.3 from Saxonica
source in transformFiletoString=/home/lou/Desktop/LacyWork/outgoing/0101Time/temp.xml
stylsheet=/home/lou/Public/pdf2tei/pt0.xsl
Error
I/O error reported by XML parser processing
/home/lou/Desktop/LacyWork/outgoing/0101Time/temp.xml: No such file or directory. Caused
by java.io.FileNotFoundException: No such file or directory
Simply doing touch [dtdFilename]
makes this error go away.
Here for comparison is the message provided if the source file really doesn't exist
SaxonC-HE 11.3 from Saxonica
source in transformFiletoString=/home/lou/Desktop/LacyWork/outgoing/0101Time/tump.xml
stylsheet=/home/lou/Public/pdf2tei/pt0.xsl
Error
I/O error reported by XML parser processing
/home/lou/Desktop/LacyWork/outgoing/0101Time/tump.xml: No such file or directory. Caused
by java.io.FileNotFoundException: No such file or directory
Can you see any difference? Me neither.
Related issues
Updated by O'Neil Delpratt over 2 years ago
Do you have the files available to try and reproduce the problem please?
Updated by Michael Kay over 2 years ago
We're constrained here by the information returned to us by the XML parser. In general if an error message comes from the XML parser, our aim should be
(a) to say that it's an error message from the XML parser
(b) to report all the information that the XML parser gives us
(c) to supplement this with information about what we were doing at the time (e.g. parsing a source document or parsing a stylesheet).
(d) occasionally, it may make sense for us to "interpret" the information, for example if we know that a particular message often arises because the source file is empty, we might say something like "This could mean that the source file is empty". But most of the time, this isn't advisable. Apart from anything else, in the general case we don't know or care which SAX parser is in use.
It can be difficult to pack all of this into one message without risking duplication, so we're sometimes more selective.
The question in this case is, do we actually get suffiicent information back from the XML parser (other than the URI that couldn't be resolved) to pin-point the cause more precisely?
Updated by Norm Tovey-Walsh over 2 years ago
It's misleading, but it's probably not technically wrong. It was trying to parse (the external parsed entity that doesn't exist in) temp.xml when it failed to find the file.
I wonder if we should try to report this to Apache since I assume it's a Xerces bug.
Updated by Michael Kay over 2 years ago
~~With SaxonJ 11.3, from the command line, processing a file books-with-dtd.xml that refers to a non-existent external DTD, I'm getting ~~
I/O error reported by XML parser processing file:/Users/mike/GitHub/saxon2020/src/test/testdata/books-with-dtd.xml: /Users/mike/GitHub/saxon2020/src/test/testdata/booksZZZZ.dtd (No such file or directory): /Users/mike/GitHub/saxon2020/src/test/testdata/booksZZZZ.dtd (No such file or directory)
~~So in fact, both the source file (books-with-dtd.xml) and the missing DTD file (booksZZZZ.dtd) are named in the error message. The only problem here is that the second part of the message is repeated.
I'm not sure why SaxonC-HE should be different -- but it is a different Java VM, so the details of exactly what's found in the exception message might well vary. ~~
Updated by Michael Kay over 2 years ago
Sorry, the previous message is misleading. What I observed is what happens in IntelliJ, which is slightly different from what happens in the released product, because there is some diagnostic code included only for C# transpilation. If I remove that code, I get
I/O error reported by XML parser processing file:/Users/mike/GitHub/saxon2020/src/test/testdata/books-with-dtd.xml: /Users/mike/GitHub/saxon2020/src/test/testdata/booksZZZZ.dtd (No such file or directory)
which it's difficult to improve. It would be nice if we could improve the second half of the message to say
File or directory not found: /Users/mike/GitHub/saxon2020/src/test/testdata/booksZZZZ.dtd
but that would involve parsing the error message delivered by FileNotFoundException and re-arranging its contents, which feels like a fool's errand.
Apart from this inelegance, the message meets the criteria: we say that the message comes from the XML parser, we say what source file we were parsing, and we reproduce the error information returned to us by the parser.
Updated by Martin Honnen over 2 years ago
SaxonC HE, at least when I tried the case of a missing DTD with the transform
command line tool, somehow, as the original poster says, does fail to mention the DTD file itself, it lists the XML itself.
So with a directory containing two files sample1.xml and sheet1.xsl and the sample1.xml having <!DOCTYPE root SYSTEM "sample1.dtd">
both SaxonCS and SaxonJ, when running a transformation from the command line with -s:sample1.xml -xsl:sheet1.xsl
mention the sample1.dtd
output e.g. (SaxonCS)
Error reported by XML parser processing file:///C:/Users/SomeUser/SomePath/missing-dtd-test/sample1.xml: Cannot resolve external DTD subset - public ID = '', system ID = 'sample1.dtd'.
Exiting with code 2
and SaxonJ
I/O error reported by XML parser processing file:/C:/Users/SomeUser/SomePath/missing-dtd-test/./sample1.xml: C:\Users\SomeUser\SomePath\missing-dtd-test\sample1.dtd (Das System kann die angegebene Datei nicht finden)
while SaxonC HE ('C:\Program Files\Saxonica\SaxonC HE 11.3\command\Transform.exe' -s:sample1.xml -xsl:sheet1.xsl
) only manages
I/O error reported by XML parser processing file:/C:/Users/SomeUser/SomePath/missing-dtd-test/sample1.xml: Das System kann die angegebene Datei nicht finden
Sorry about part of the error messages being in German, I am not sure which setting to change to have all in English, I think Michael understands German anyway, for those who don't "Das System kann die angegebene Datei nicht finden" says something like "The system can't find the referenced file".
I don't know whether it is a JRE/Excelsior or Saxon problem with SaxonC.
Updated by O'Neil Delpratt over 2 years ago
I think this is a SaxonC problem. I will investigate it further.
Updated by Michael Kay over 2 years ago
- Project changed from Saxon to SaxonC
- Assignee set to O'Neil Delpratt
Transferred to SaxonC project.
Updated by Michael Kay over 2 years ago
- Subject changed from misleading error message to misleading error message on failure to find external DTD
Updated by Michael Kay over 2 years ago
- Related to Bug #5606: Duplicate source error message in exception (Saxon 11) added
Please register to edit this issue