Project

Profile

Help

Bug #5952

closed

unparsed-text-available throws an un-catchable IllegalCharsetNameException

Added by Nico Kutscherauer almost 2 years ago. Updated 3 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
2023-04-02
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
11, 12, trunk
Fix Committed on Branch:
11, 12, trunk
Fixed in Maintenance Release:
Platforms:
Java

Description

If I try to read the following non-wellformed XML document with the function unparsed-text():

<?xml version="1.0" encoding="foo"?>
<foo/>

I get an IllegalCharsetNameException. It was suprising to me, but understandable. The problem is that I can't catch this error. If I try to check before with unparsed-text-available() if the document is readable I get the same error.

Even with <xsl:try> I can not prevent that the process is terminated by this exception.

This is not the desired behaviour, is it?

(I tried it in several versions of Saxon-HE 10.6, 11.5, 12.1 - all the same behavior.)

Actions #1

Updated by Michael Kay almost 2 years ago

Thanks for reporting it.

The spec doesn't give a definitive answer what should be done here, but it certainly shouldn't be an uncatchable error.

Either unparsed-text() should raise FOUT1190 (in which case unparsed-text-available() would return false), or we should ignore the encoding given in the file and continue as if it wasn't there.

Actions #2

Updated by Michael Kay almost 2 years ago

  • Status changed from New to In Progress
  • Priority changed from Low to Normal

Unfortunately the QT3 test driver uses its own resolver for unparsed-text() and this masks the error.

Running it from the command line, though, I can reproduce the problem. Almost, anyway: with encoding="foo" I get java.nio.charset.UnsupportedCharsetException , and with encoding="fo o" I get java.nio.charset.IllegalCharsetNameException.

Actions #3

Updated by Michael Kay almost 2 years ago

The standard resolver invokes ResourceLoader.urlReader(), which returns ResourceLoader.inferEncoding(), which returns "foo". It then calls getReaderForStream(), which calls Charset charset2 = Charset.forName(resourceEncoding); which is where the exception occurs.

The most obvious action here is to catch this exception and throw FOUT1190. However we're on a path here that's used for other things beyond unparsed-text(), so we need to look carefully at the other users of the code to check that this makes sense.

Actions #4

Updated by Michael Kay almost 2 years ago

The following appears to work:

(a) Change ResourceLoader.getReaderFromStream() to catch any unchecked exceptions and throw an UnsupportedEncodingException.

(b) Change StandardUnparsedTextResolver() to catch the UnsupportedEncodingException and turn it into an FOUT1170.

Actions #5

Updated by Michael Kay almost 2 years ago

  • Status changed from In Progress to Resolved
  • Applies to branch 11, 12, trunk added
  • Fix Committed on Branch 11, 12, trunk added
  • Platforms Java added
Actions #6

Updated by O'Neil Delpratt over 1 year ago

  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 12.2 added

Bug fix applied in the Saxon 12.2 maintenance release.

Actions #7

Updated by Debbie Lockett over 1 year ago

  • Status changed from Resolved to Closed
  • Fixed in Maintenance Release 11.6 added

Bug fix applied in the Saxon 11.6 maintenance release.

Actions #8

Updated by Nico Kutscherauer 3 months ago

Hi,

sorry for the late feedback! But I noticed now, that there might be still a small issue:

First you wrote:

Michael Kay wrote in #note-1:

[...]

Either unparsed-text() should raise FOUT1190 [...]

Then you wrote:

Michael Kay wrote in #note-4:

[...] turn it into an FOUT1170 .

Indeed Saxon throws now a FOUT1170. Is the switch to another error code done by intension or just a mistake?

At least the added QT3TS test case for this issue expects only a FOUT1190.

BR, Nico

Please register to edit this issue

Also available in: Atom PDF