Actions
Bug #6018
closedUnicodeString - indexWhere does not start from the expected position
Start date:
2023-05-05
Due date:
% Done:
100%
Estimated time:
Legacy ID:
Applies to branch:
12, trunk
Fix Committed on Branch:
12, trunk
Fixed in Maintenance Release:
Platforms:
.NET, Java
Description
There is a performance regression when escaping special characters with the UnicodeString
class.
In the method writeEscape
of the XMLEmitter
class,
while (segstart < clength) {
// find a maximal sequence of "ordinary" characters
long found = chars.indexWhere(special, segstart);
that calls the indexWhere
method of the UnicodeString
class,
public long indexWhere(IntPredicate predicate, long from) {
IntIterator iter = codePoints();
long i = 0;
while (iter.hasNext()) {
int ch = iter.next();
if (i >= from && predicate.test(ch)) {
return i;
}
i++;
}
return -1;
}
the character sequence is searched from the beginning for each segment, whereas it should be searched from the position from
. This is O(n2) and painful for large text nodes with many escapable characters (e.g. escaped XML).
This is a quick suggestion:
public long indexWhere(IntPredicate predicate, long from) {
for (long i = from; i < length(); ++i) {
int ch = codePointAt(i);
if (predicate.test(ch)) {
return i;
}
}
return -1;
}
Please register to edit this issue
Actions