Project

Profile

Help

Bug #3491

closed

In error messages Japanese characters are displayed as \uXXXX

Added by Gerben Abbink over 6 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Diagnostics
Sprint/Milestone:
-
Start date:
2017-10-18
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
trunk
Fix Committed on Branch:
trunk
Fixed in Maintenance Release:
Platforms:

Description

I run this demo stylesheet on the command line:

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/" élève="" 日本="">

</xsl:template>

</xsl:stylesheet>

Saxon reports two errors:

Attribute @élève is not allowed on element xsl:template.

Attribute @\u65e5\u672c is not allowed on element xsl:template.

Why does the second error contain Unicode escape sequences (\uXXXX)? I bet Japanese users would like to see Japanese characters in error messages, and so do I.

Actions #1

Updated by Community Admin over 6 years ago

Tricky one. There are two reasons for this:

(a) on many systems, console output doesn't display the characters correctly. It's probably capable of displaying them if configured correctly, but we have no control over that

(b) errors like this are often caused by using a character that is visually similar to the required character

Perhaps the right answer is to display it with the real characters followed by the hex in parentheses.

The other issue is that we are formatting it like this even when the message is written to an XML validation report. Really we should emit validation messages with structural markup and then render the markup according to the output destination.

On 18 Oct 2017, at 07:39, Saxonica Developer Community wrote:

Actions #2

Updated by Gerben Abbink over 6 years ago

I did some more tests and found this:

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/"  日本="">

	<xsl:日本/>

</xsl:template>

</xsl:stylesheet>

From the command prompt, this produces two errors messages, one of them is escaped, the other is not:

Static error in xsl:template/@?? on line 3 column 33 of ABC.xsl:

XTSE0090: Attribute @\u65e5\u672c is not allowed on element xsl:template

Static error at xsl:?? on line 4 column 12 of ABC.xsl:

XTSE0010: Unknown XSLT element: ??

Errors were reported during stylesheet compilation

In Java my error listener receives:

Attribute @\u65e5\u672c is not allowed on element xsl:template. <<< Escaped

Unknown XSLT element: 日本. <<< Not escaped

Also, one message is escaped, the other is not.

I would like to receive the error messages without Unicode escapes.

Actions #3

Updated by Michael Kay over 6 years ago

  • Category set to Diagnostics
  • Status changed from New to Resolved
  • Assignee set to Michael Kay
  • Priority changed from Low to Normal
  • Applies to branch trunk added
  • Fix Committed on Branch trunk added

I decided to make no changes on the 9.8 branch (partly for stability: some people, unwisely, parse Saxon's error messages and expect them to be stable).

For the next major release, I have

(a) moved the code that expands special characters into the StandardErrorListener, so (i) you can bypass it by writing your own ErrorListener, and (ii) it is applied more systematically to all diagnostic messages.

(b) made it configurable: you can set an option in the StandardErrorListener indicating the maximum codepoint that is treated as an ordinary safe character; setting this to x10FFFF suppresses all expansion.

(c) special characters above this threshold are now output as C[xHHHHH] where C is the character itself and HHHHH is its Unicode codepoint value (If there is no glyph for C you will generally see some kind of substitute character).

(d) the Logger interface, used as the destination for StandardErrorListener messages, now has an interrogative @isUnicodeAware()@, so the character expansion can also be suppressed by setting this property in the Logger.

I have kept the default threshold at 255. For better or worse, this reflects the maximum codepoint that can be safely output to the Windows console in many locales without manual reconfiguration; and attempting to auto-detect the configuration and adapt to it appears to be a minefield.

Actions #4

Updated by O'Neil Delpratt over 5 years ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 9.9.0.1 added

Bug fix applied in the Saxon 9.9.0.1 major release.

Please register to edit this issue

Also available in: Atom PDF