Project

Profile

Help

Bug #6018

closed

UnicodeString - indexWhere does not start from the expected position

Added by Steven Dürrenmatt over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Low
Assignee:
Category:
Performance
Sprint/Milestone:
-
Start date:
2023-05-05
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
12, trunk
Fix Committed on Branch:
12, trunk
Fixed in Maintenance Release:
Platforms:
.NET, Java

Description

There is a performance regression when escaping special characters with the UnicodeString class.

In the method writeEscape of the XMLEmitter class,

while (segstart < clength) {
    // find a maximal sequence of "ordinary" characters
    long found = chars.indexWhere(special, segstart);

that calls the indexWhere method of the UnicodeString class,

public long indexWhere(IntPredicate predicate, long from) {
    IntIterator iter = codePoints();
    long i = 0;
    while (iter.hasNext()) {
        int ch = iter.next();
        if (i >= from && predicate.test(ch)) {
            return i;
        }
        i++;
    }
    return -1;
}

the character sequence is searched from the beginning for each segment, whereas it should be searched from the position from. This is O(n2) and painful for large text nodes with many escapable characters (e.g. escaped XML).

This is a quick suggestion:

public long indexWhere(IntPredicate predicate, long from) {
    for (long i = from; i < length(); ++i) {
        int ch = codePointAt(i);
        if (predicate.test(ch)) {
            return i;
        }
    }
    return -1;
}

Please register to edit this issue

Also available in: Atom PDF