Bug #2622
closedCombining use-character-maps and normalization-form="NFC" attributes produce unwanted output
100%
Description
(created initially in XSL-List: The Open Forum on XSL, a mailing list managed by Mulberry Technologies, Inc..)
Dear all,
For some reasons, I need to escape specific characters in the output and also need to produce normalized Unicode in NFC.
Here is my input :
<inputText>”; ;</inputText> <!-- which is : (U+201D U+003B U+0020 U+003B) -->
Here is the output properties of my stylesheet :
<xsl:output method="xml" version="1.0" encoding="UTF-8"
indent="yes" omit-xml-declaration="no"
use-character-maps="unsupported_characters"
normalization-form="NFC"
/>
The character-map definition :
<xsl:character-map name="unsupported_characters">
<xsl:output-character character="“" string="""/>
<xsl:output-character character="”" string="""/>
</xsl:character-map>
With this template :
<xsl:template match="/ ">
<shortDescription><xsl:value-of select=" inputText "/></shortDescription>
</xsl:template>
Now the output :
<shortDescription>"; ;</shortDescription> <!-- which is (U+0022 U+037E U+0020 U+003B) -->
Why the semicolon ( U+003B) is translated into Greek question mark ( U+037E) just after the escaped quote while the next semi colon is kept ?
But the right question is why my semicolon is escaped into Greek question mark ?
To go further :
1- If I do not use character-map the result is :
<shortDescription>”; ;</shortDescription> <!-- which is (U+201D U+003B U+0020 U+003B) -->
2- If I do not normalize the Unicode (without normalization-form="NFC" attribute)
<shortDescription>"; ;</shortDescription> <!-- which is (U+0022 U+003B U+0020 U+003B) -->
3- same behavior with other characters combinations :
- double comma quotation mark + K
<inputText>"K</inputText> <!-- which is : (U+201D U+004B) -->
<!-- output -->
<shortDescription>"K</shortDescription> <!-- which is (U+0022 U+212A) -->
- double comma quotation mark + Chinese glyph
<inputText>"力</inputText> <!-- which is : (U+201D U+529B) -->
<!-- output -->
<shortDescription>"力</shortDescription> <!-- which is (U+0022 U+F98A) -->
4- In addition, Wolfgan L. in the same thread answered :
??Even the solitary identity transformation of the semicolon 0x3B
<xsl:output-character character=";" string=";"/>
results in a translation to U+037E of all semicolons. Seems to be a bug.
SaxonHE 9.6.0.1??
Thanks for the help
Lancelot
Please register to edit this issue