Project

Profile

Help

Bug #2622

closed

Combining use-character-maps and normalization-form="NFC" attributes produce unwanted output

Added by Lancelot Meurillon almost 9 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Serialization
Sprint/Milestone:
Start date:
2016-02-16
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
9.4, 9.5, 9.6, 9.7
Fix Committed on Branch:
9.6, 9.7
Fixed in Maintenance Release:
Platforms:

Description

(created initially in XSL-List: The Open Forum on XSL, a mailing list managed by Mulberry Technologies, Inc..)

Dear all,

For some reasons, I need to escape specific characters in the output and also need to produce normalized Unicode in NFC.

Here is my input :

<inputText>”; ;</inputText> <!-- which is : (U+201D U+003B U+0020 U+003B) -->

Here is the output properties of my stylesheet :

<xsl:output method="xml" version="1.0" encoding="UTF-8"
        indent="yes" omit-xml-declaration="no" 
        use-character-maps="unsupported_characters"
        normalization-form="NFC"
    />

The character-map definition :

<xsl:character-map name="unsupported_characters">
        <xsl:output-character character="&#8220;" string="&quot;"/>
        <xsl:output-character character="&#8221;" string="&quot;"/>
    </xsl:character-map>

With this template :

<xsl:template match="/ ">
    <shortDescription><xsl:value-of select=" inputText "/></shortDescription>
</xsl:template>

Now the output :

<shortDescription>"; ;</shortDescription> <!-- which is (U+0022  U+037E  U+0020  U+003B) -->

Why the semicolon ( U+003B) is translated into Greek question mark ( U+037E) just after the escaped quote while the next semi colon is kept ?

But the right question is why my semicolon is escaped into Greek question mark ?

To go further :

1- If I do not use character-map the result is :

<shortDescription>”; ;</shortDescription> <!-- which is (U+201D U+003B U+0020 U+003B) -->

2- If I do not normalize the Unicode (without normalization-form="NFC" attribute)

<shortDescription>"; ;</shortDescription> <!-- which is (U+0022 U+003B U+0020 U+003B) -->

3- same behavior with other characters combinations :

  • double comma quotation mark + K
<inputText>"K</inputText> <!-- which is : (U+201D U+004B) -->
<!-- output -->
<shortDescription>"K</shortDescription> <!-- which is (U+0022  U+212A) -->
  • double comma quotation mark + Chinese glyph
<inputText>"力</inputText> <!-- which is : (U+201D U+529B) -->
<!-- output -->
<shortDescription>"力</shortDescription> <!-- which is (U+0022  U+F98A) -->

4- In addition, Wolfgan L. in the same thread answered :

??Even the solitary identity transformation of the semicolon 0x3B

 <xsl:output-character character=";" string=";"/>

results in a translation to U+037E of all semicolons. Seems to be a bug.

SaxonHE 9.6.0.1??

Thanks for the help

Lancelot

Please register to edit this issue

Also available in: Atom PDF