Project

Profile

Help

Bug #1588

closed

The fn:normalize-unicode() function produces unexpected results for non-ascii characters

Added by Philip Fearon over 10 years ago. Updated almost 10 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Sprint/Milestone:
Start date:
2012-07-17
Due date:
% Done:

100%

Estimated time:
Platforms:

Description

The expression

string-to-codepoints(normalize-unicode('Eisbär', 'NFKD'))

is returning

@

69 105 115 98 4192 33536 114@

whereas the correct answer is

69 105 115 98 97 776 114

The following XSLT was used in the test:

<xsl:template name="main" match="/">
<xsl:result-document href="#main" method="append-content">
   <xsl:variable name="input" select="'Eisbär'"/>
   <xsl:variable name="normal" select="string-to-codepoints(normalize-unicode($input, 'NFKD'))"/>
   <p>String: <xsl:value-of select="$input"/></p>
   <p>To CodePoints result: <xsl:value-of select="$normal"/></p>
   <p>Expected: 69 105 115 98 97 776 114</p>
   <p>Round-trip codepoints-to-string: <xsl:value-of select="codepoints-to-string($normal)"/></p>
</xsl:result-document>
</xsl:template>
Actions #1

Updated by Philip Fearon over 10 years ago

  • Status changed from In Progress to Resolved

The cause of this issue was associated with the reading and parsing of the DecompositionTable in normalizationData.xml - this has now been resolved

Actions #2

Updated by O'Neil Delpratt almost 10 years ago

  • Status changed from Resolved to Closed
  • Sprint/Milestone set to Release 1.1
  • % Done changed from 0 to 100
  • Found in version set to 1.0
  • Fixed in version set to 1.1

Bug fixed for Saxon-CE version 1.1 release

Please register to edit this issue

Also available in: Atom PDF