Project

Profile

Help

Bug #757

closed

contains() with an accent-blind collation

Added by Anonymous about 18 years ago. Updated about 12 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
XPath conformance
Sprint/Milestone:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Legacy ID:
sf-1444006
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

SourceForge user: mhkay

When contains() and other similar functions are used

with an accent-blind collation, accents are not ignored

as they should be. For example,

contains("télé", "tele",

"http://saxon.sf.net/collation?lang=fr-FR;strength=primary")

returns false.

The reason for the problem is an undocumented behaviour

of the JDK RuleBasedCollator class: with this kind of

collation, the stream of collation elements returned by

the CollationElementIterator includes zero values where

the accents occur, and the application (i.e. Saxon) is

apparently expected to ignore these zero values. The

attached file is a new version of

net.sf.saxon.sort.RuleBaseSubstringMatcher modified to

behave this way.

The functions affected are contains, starts-with,

ends-with, substring-before, and substring-after.


Files

RuleBasedSubstringMatcher.java (9.56 KB) RuleBasedSubstringMatcher.java Anonymous, 2006-03-06 10:26

Please register to edit this issue

Also available in: Atom PDF