Bug #757
closedcontains() with an accent-blind collation
0%
Description
SourceForge user: mhkay
When contains() and other similar functions are used
with an accent-blind collation, accents are not ignored
as they should be. For example,
contains("télé", "tele",
"http://saxon.sf.net/collation?lang=fr-FR;strength=primary")
returns false.
The reason for the problem is an undocumented behaviour
of the JDK RuleBasedCollator class: with this kind of
collation, the stream of collation elements returned by
the CollationElementIterator includes zero values where
the accents occur, and the application (i.e. Saxon) is
apparently expected to ignore these zero values. The
attached file is a new version of
net.sf.saxon.sort.RuleBaseSubstringMatcher modified to
behave this way.
The functions affected are contains, starts-with,
ends-with, substring-before, and substring-after.
Files
Please register to edit this issue