Bug #6225
closedSignificant performance problem for case insensitive regex matching under high concurrency
0%
Description
Seen in Saxon-HE v9.9.1-8 (but appears to also be an issue in more recent versions)
Relates to previous fix https://saxonica.plan.io/issues/4449
The above fix added synchronized to the method net.sf.saxon.regex.CaseVariants#getCaseVariants. While this fixes the NPE it results in a significant performance hit when calling this method with high concurrency. We use saxon-HE in a large scale data processing system so may have high numbers of threads concurrently doing XSLT processing. If xpath functions like matches() are used with the case insensitive flag then processing is very very slow with a lot of blocked threads.
The following is taken from a thread dump. We have many threads in this state.
"Data Processor# #10227" #42246 prio=1 os_prio=0 cpu=4294222.87ms elapsed=26283.54s tid=0x00007f59c0167420 nid=0xef63 waiting for monitor entry [0x00007f5798ab7000]
java.lang.Thread.State: BLOCKED (on object monitor)
at net.sf.saxon.regex.CaseVariants.getCaseVariants(CaseVariants.java:92)
- waiting to lock <0x0000000264bf1c00> (a java.lang.Class for net.sf.saxon.regex.CaseVariants)
at net.sf.saxon.regex.REMatcher.equalCaseBlind(REMatcher.java:715)
at net.sf.saxon.regex.Operation$OpAtom.iterateMatches(Operation.java:631)
at net.sf.saxon.regex.REMatcher.checkPreconditions(REMatcher.java:494)
at net.sf.saxon.regex.REMatcher.match(REMatcher.java:426)
at net.sf.saxon.regex.ARegularExpression.containsMatch(ARegularExpression.java:89)
at net.sf.saxon.functions.Matches.call(Matches.java:78)
at net.sf.saxon.functions.Matches.call(Matches.java:24)
at net.sf.saxon.expr.FunctionCall.iterate(FunctionCall.java:532)
at net.sf.saxon.expr.Expression.effectiveBooleanValue(Expression.java:889)
at net.sf.saxon.expr.FilterIterator$NonNumeric.matches(FilterIterator.java:190)
at net.sf.saxon.expr.FilterIterator.getNextMatchingItem(FilterIterator.java:76)
at net.sf.saxon.expr.FilterIterator.next(FilterIterator.java:62)
at net.sf.saxon.om.SequenceIterator.forEachOrFail(SequenceIterator.java:127)
at net.sf.saxon.expr.sort.DocumentOrderIterator.<init>(DocumentOrderIterator.java:51)
at net.sf.saxon.expr.sort.DocumentSorter.iterate(DocumentSorter.java:260)
at net.sf.saxon.expr.parser.Evaluator$7.evaluate(Evaluator.java:239)
at net.sf.saxon.expr.LetExpression.eval(LetExpression.java:532)
at net.sf.saxon.expr.LetExpression.processLeavingTail(LetExpression.java:714)
at net.sf.saxon.expr.instruct.Choose.processLeavingTail(Choose.java:896)
at net.sf.saxon.expr.LetExpression.processLeavingTail(LetExpression.java:723)
at net.sf.saxon.expr.instruct.TemplateRule.applyLeavingTail(TemplateRule.java:352)
at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:533)
at net.sf.saxon.expr.instruct.ApplyTemplates.apply(ApplyTemplates.java:300)
at net.sf.saxon.expr.instruct.ApplyTemplates.processLeavingTail(ApplyTemplates.java:255)
at net.sf.saxon.expr.instruct.Block.processLeavingTail(Block.java:735)
at net.sf.saxon.expr.instruct.Instruction.process(Instruction.java:132)
at net.sf.saxon.expr.instruct.ElementCreator.processLeavingTail(ElementCreator.java:353)
at net.sf.saxon.expr.instruct.Copy.processLeavingTail(Copy.java:431)
at net.sf.saxon.expr.instruct.TemplateRule.applyLeavingTail(TemplateRule.java:352)
at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:533)
at net.sf.saxon.trans.rules.TextOnlyCopyRuleSet.process(TextOnlyCopyRuleSet.java:71)
at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:495)
at net.sf.saxon.trans.XsltController.applyTemplates(XsltController.java:746)
at net.sf.saxon.s9api.AbstractXsltTransformer.applyTemplatesToSource(AbstractXsltTransformer.java:347)
at net.sf.saxon.s9api.XsltTransformer.transform(XsltTransformer.java:349)
at net.sf.saxon.jaxp.TransformerImpl.transform(TransformerImpl.java:71)
at net.sf.saxon.jaxp.TransformerHandlerImpl.endDocument(TransformerHandlerImpl.java:173)
The following class is both a test case to demonstrate the problem and a modified version of CaseVariants that fixes the problem (changes marked FIX:). It compares the time taken to repeatedly call getCaseVariants on both versions of the class. Running this main method results in the following output on my pc (with 12cores/24threads).
Round 1
CaseVariants completed in 1314ms
CaseVariantsModified completed in 40ms
Round 2
CaseVariants completed in 1875ms
CaseVariantsModified completed in 10ms
The synchronized method in CaseVariants is a huge performance hit. In CaseVariantsModified I have changed it to use double checked locking to avoid the synchronisation for all but the first few calls.
It would be greatly appreciated if you could patch saxon to fix this issue. Would you also be able to back port the fix into v9?
package test;
import net.sf.saxon.Configuration;
import net.sf.saxon.lib.ParseOptions;
import net.sf.saxon.lib.Validation;
import net.sf.saxon.om.AxisInfo;
import net.sf.saxon.om.NodeInfo;
import net.sf.saxon.pattern.NameTest;
import net.sf.saxon.regex.CaseVariants;
import net.sf.saxon.trans.XPathException;
import net.sf.saxon.tree.iter.AxisIterator;
import net.sf.saxon.type.Type;
import net.sf.saxon.z.IntArraySet;
import net.sf.saxon.z.IntHashMap;
import net.sf.saxon.z.IntToIntHashMap;
import net.sf.saxon.z.IntToIntMap;
import java.io.InputStream;
import java.time.Duration;
import java.time.Instant;
import java.util.ArrayList;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.function.IntFunction;
import javax.xml.transform.stream.StreamSource;
public class TestCaseVariants {
public static final int THREADS = 24;
public static final int ITERATIONS = 1_000;
public static void main(String[] args) throws InterruptedException {
// Ensure all the maps have been built, so we are only testing the performance
// of getting variants
CaseVariants.getCaseVariants('a');
CaseVariantsModified.getCaseVariants('a');
final ExecutorService executorService = Executors.newFixedThreadPool(THREADS);
for (int round = 1; round <= 2; round++) {
System.out.println("Round " + round);
runTest("CaseVariants", executorService, CaseVariants::getCaseVariants);
runTest("CaseVariantsModified", executorService, CaseVariantsModified::getCaseVariants);
}
executorService.shutdown();
}
private static void runTest(final String name,
final ExecutorService executorService,
final IntFunction<int[]> func) throws InterruptedException {
final CountDownLatch startAllLatch = new CountDownLatch(1);
final CountDownLatch threadReadyLatch = new CountDownLatch(THREADS);
final CountDownLatch completionLatch = new CountDownLatch(THREADS);
for (int k = 0; k < THREADS; k++) {
executorService.submit(() -> {
threadReadyLatch.countDown();
try {
// All threads start together
startAllLatch.await();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
for (long j = 0; j < ITERATIONS; j++) {
for (int i = 0; i < 1000; i++) {
func.apply(i);
}
}
completionLatch.countDown();
});
}
final Instant startTime = Instant.now();
// Wait for all threads to be ready
threadReadyLatch.await();
// Release all threads
startAllLatch.countDown();
// Wait for all threads to complete
completionLatch.await();
final Duration duration = Duration.between(startTime, Instant.now());
System.out.println(name + " completed in " + duration.toMillis() + "ms");
}
// --------------------------------------------------------------------------------
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Copyright (c) 2018 Saxonica Limited.
// This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0.
// If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
// This Source Code Form is "Incompatible With Secondary Licenses", as defined by the Mozilla Public License, v. 2.0.
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/**
* Modified copy of {@link net.sf.saxon.regex.CaseVariants}.
*/
private static class CaseVariantsModified {
// Use one hashmap for characters with a single case variant, another for characters with multiple
// case variants, to reduce the number of objects that need to be allocated
// FIX: Make these volatile
private static volatile IntToIntMap monoVariants = null;
private static volatile IntHashMap<int[]> polyVariants = null;
static void build() {
// FIX: Use temporary local variables
final IntToIntMap monoVariants = new IntToIntHashMap(2500);
final IntHashMap<int[]> polyVariants = new IntHashMap<>(100);
InputStream in = Configuration.locateResource("casevariants.xml", new ArrayList<>(), new ArrayList<>());
if (in == null) {
throw new RuntimeException("Unable to read casevariants.xml file");
}
Configuration config = new Configuration();
ParseOptions options = new ParseOptions();
options.setSchemaValidationMode(Validation.SKIP);
options.setDTDValidationMode(Validation.SKIP);
NodeInfo doc;
try {
doc = config.buildDocumentTree(new StreamSource(in, "casevariants.xml"), options).getRootNode();
} catch (XPathException e) {
throw new RuntimeException("Failed to build casevariants.xml", e);
}
AxisIterator iter = doc.iterateAxis(AxisInfo.DESCENDANT, new NameTest(Type.ELEMENT, "", "c", config.getNamePool()));
while (true) {
NodeInfo item = iter.next();
if (item == null) {
break;
}
String code = item.getAttributeValue("", "n");
int icode = Integer.parseInt(code, 16);
String variants = item.getAttributeValue("", "v");
String[] vhex = variants.split(",");
int[] vint = new int[vhex.length];
for (int i=0; i<vhex.length; i++) {
vint[i] = Integer.parseInt(vhex[i], 16);
}
if (vhex.length == 1) {
monoVariants.put(icode, vint[0]);
} else {
polyVariants.put(icode, vint);
}
}
// FIX: Set static class variables at the end
CaseVariantsModified.polyVariants = polyVariants;
// Must set this last as other threads will be checking this value
CaseVariantsModified.monoVariants = monoVariants;
}
/**
* Get the case variants of a character
*
* @param code the character whose case variants are required
* @return the case variants of the character, excluding the character itself
*/
// FIX: Method no longer synchronized
public static int[] getCaseVariants(int code) {
// FIX: Use local variables to avoid additional volatile reads
IntToIntMap monoVariants = CaseVariantsModified.monoVariants;
if (monoVariants == null) {
synchronized (CaseVariantsModified.class) {
monoVariants = CaseVariantsModified.monoVariants;
if (monoVariants == null) {
build();
monoVariants = CaseVariantsModified.monoVariants;
}
}
}
int mono = monoVariants.get(code);
if (mono != monoVariants.getDefaultValue()) {
return new int[]{mono};
} else {
int[] result = polyVariants.get(code);
if (result == null) {
return IntArraySet.EMPTY_INT_ARRAY;
} else {
return result;
}
}
}
/**
* Get the case variants of roman letters (A-Z, a-z), other than the letters A-Z and a-z themselves
*/
/*@NotNull*/ public static int[] ROMAN_VARIANTS = {0x0130, 0x0131, 0x212A, 0x017F};
}
}
Files
Please register to edit this issue