Project

Profile

Help

Bug #5561

open

misleading error message on failure to find external DTD

Added by Lou Burnard almost 2 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Low
Category:
-
Start date:
2022-06-11
Due date:
% Done:

0%

Estimated time:
Found in version:
Fixed in version:
Platforms:

Description

If the source file being transformed by a stylesheet contains a DOCTYPE statement which references a non-existent file, an I/O error is raised. That's fine, but the diagnostic message suggests that it's the source file which doesn't exist, rather than a file it references.

SaxonC-HE 11.3 from Saxonica
source in transformFiletoString=/home/lou/Desktop/LacyWork/outgoing/0101Time/temp.xml 
stylsheet=/home/lou/Public/pdf2tei/pt0.xsl
Error 
   I/O error reported by XML parser processing
  /home/lou/Desktop/LacyWork/outgoing/0101Time/temp.xml: No such file or directory. Caused
  by java.io.FileNotFoundException: No such file or directory

Simply doing touch [dtdFilename] makes this error go away.

Here for comparison is the message provided if the source file really doesn't exist

SaxonC-HE 11.3 from Saxonica
source in transformFiletoString=/home/lou/Desktop/LacyWork/outgoing/0101Time/tump.xml 
stylsheet=/home/lou/Public/pdf2tei/pt0.xsl
Error 
   I/O error reported by XML parser processing
  /home/lou/Desktop/LacyWork/outgoing/0101Time/tump.xml: No such file or directory. Caused
  by java.io.FileNotFoundException: No such file or directory

Can you see any difference? Me neither.


Related issues

Related to Saxon - Bug #5606: Duplicate source error message in exception (Saxon 11)ClosedMichael Kay2022-07-15

Actions
Actions #1

Updated by O'Neil Delpratt almost 2 years ago

Do you have the files available to try and reproduce the problem please?

Actions #2

Updated by Michael Kay almost 2 years ago

We're constrained here by the information returned to us by the XML parser. In general if an error message comes from the XML parser, our aim should be

(a) to say that it's an error message from the XML parser

(b) to report all the information that the XML parser gives us

(c) to supplement this with information about what we were doing at the time (e.g. parsing a source document or parsing a stylesheet).

(d) occasionally, it may make sense for us to "interpret" the information, for example if we know that a particular message often arises because the source file is empty, we might say something like "This could mean that the source file is empty". But most of the time, this isn't advisable. Apart from anything else, in the general case we don't know or care which SAX parser is in use.

It can be difficult to pack all of this into one message without risking duplication, so we're sometimes more selective.

The question in this case is, do we actually get suffiicent information back from the XML parser (other than the URI that couldn't be resolved) to pin-point the cause more precisely?

Actions #3

Updated by Michael Kay almost 2 years ago

  • Project changed from SaxonC to Saxon
Actions #4

Updated by Norm Tovey-Walsh almost 2 years ago

It's misleading, but it's probably not technically wrong. It was trying to parse (the external parsed entity that doesn't exist in) temp.xml when it failed to find the file.

I wonder if we should try to report this to Apache since I assume it's a Xerces bug.

Actions #5

Updated by Michael Kay almost 2 years ago

~~With SaxonJ 11.3, from the command line, processing a file books-with-dtd.xml that refers to a non-existent external DTD, I'm getting ~~

I/O error reported by XML parser processing file:/Users/mike/GitHub/saxon2020/src/test/testdata/books-with-dtd.xml: /Users/mike/GitHub/saxon2020/src/test/testdata/booksZZZZ.dtd (No such file or directory): /Users/mike/GitHub/saxon2020/src/test/testdata/booksZZZZ.dtd (No such file or directory)

~~So in fact, both the source file (books-with-dtd.xml) and the missing DTD file (booksZZZZ.dtd) are named in the error message. The only problem here is that the second part of the message is repeated.

I'm not sure why SaxonC-HE should be different -- but it is a different Java VM, so the details of exactly what's found in the exception message might well vary. ~~

Actions #6

Updated by Michael Kay almost 2 years ago

Sorry, the previous message is misleading. What I observed is what happens in IntelliJ, which is slightly different from what happens in the released product, because there is some diagnostic code included only for C# transpilation. If I remove that code, I get

I/O error reported by XML parser processing file:/Users/mike/GitHub/saxon2020/src/test/testdata/books-with-dtd.xml: /Users/mike/GitHub/saxon2020/src/test/testdata/booksZZZZ.dtd (No such file or directory)

which it's difficult to improve. It would be nice if we could improve the second half of the message to say

File or directory not found: /Users/mike/GitHub/saxon2020/src/test/testdata/booksZZZZ.dtd

but that would involve parsing the error message delivered by FileNotFoundException and re-arranging its contents, which feels like a fool's errand.

Apart from this inelegance, the message meets the criteria: we say that the message comes from the XML parser, we say what source file we were parsing, and we reproduce the error information returned to us by the parser.

Actions #7

Updated by Martin Honnen almost 2 years ago

SaxonC HE, at least when I tried the case of a missing DTD with the transform command line tool, somehow, as the original poster says, does fail to mention the DTD file itself, it lists the XML itself.

So with a directory containing two files sample1.xml and sheet1.xsl and the sample1.xml having <!DOCTYPE root SYSTEM "sample1.dtd"> both SaxonCS and SaxonJ, when running a transformation from the command line with -s:sample1.xml -xsl:sheet1.xsl mention the sample1.dtd output e.g. (SaxonCS)

Error reported by XML parser processing file:///C:/Users/SomeUser/SomePath/missing-dtd-test/sample1.xml: Cannot resolve external DTD subset - public ID = '', system ID = 'sample1.dtd'.
Exiting with code 2

and SaxonJ

I/O error reported by XML parser processing file:/C:/Users/SomeUser/SomePath/missing-dtd-test/./sample1.xml: C:\Users\SomeUser\SomePath\missing-dtd-test\sample1.dtd  (Das System kann die angegebene Datei nicht finden)

while SaxonC HE ('C:\Program Files\Saxonica\SaxonC HE 11.3\command\Transform.exe' -s:sample1.xml -xsl:sheet1.xsl) only manages

I/O error reported by XML parser processing file:/C:/Users/SomeUser/SomePath/missing-dtd-test/sample1.xml: Das System kann die angegebene Datei nicht finden

Sorry about part of the error messages being in German, I am not sure which setting to change to have all in English, I think Michael understands German anyway, for those who don't "Das System kann die angegebene Datei nicht finden" says something like "The system can't find the referenced file".

I don't know whether it is a JRE/Excelsior or Saxon problem with SaxonC.

Actions #8

Updated by O'Neil Delpratt almost 2 years ago

I think this is a SaxonC problem. I will investigate it further.

Actions #9

Updated by Michael Kay almost 2 years ago

  • Project changed from Saxon to SaxonC
  • Assignee set to O'Neil Delpratt

Transferred to SaxonC project.

Actions #10

Updated by Michael Kay almost 2 years ago

  • Subject changed from misleading error message to misleading error message on failure to find external DTD
Actions #11

Updated by Michael Kay almost 2 years ago

  • Related to Bug #5606: Duplicate source error message in exception (Saxon 11) added

Please register to edit this issue

Also available in: Atom PDF