Project

Profile

Help

Bug #6225

closed

Significant performance problem for case insensitive regex matching under high concurrency

Added by at055612 at055612 about 1 year ago. Updated 12 months ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
Performance
Sprint/Milestone:
-
Start date:
2023-10-18
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
10, 11, 12, 9.9, trunk
Fix Committed on Branch:
11, 12
Fixed in Maintenance Release:
Platforms:
.NET, Java

Description

Seen in Saxon-HE v9.9.1-8 (but appears to also be an issue in more recent versions)

Relates to previous fix https://saxonica.plan.io/issues/4449

The above fix added synchronized to the method net.sf.saxon.regex.CaseVariants#getCaseVariants. While this fixes the NPE it results in a significant performance hit when calling this method with high concurrency. We use saxon-HE in a large scale data processing system so may have high numbers of threads concurrently doing XSLT processing. If xpath functions like matches() are used with the case insensitive flag then processing is very very slow with a lot of blocked threads.

The following is taken from a thread dump. We have many threads in this state.

"Data Processor# #10227" #42246 prio=1 os_prio=0 cpu=4294222.87ms elapsed=26283.54s tid=0x00007f59c0167420 nid=0xef63 waiting for monitor entry  [0x00007f5798ab7000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at net.sf.saxon.regex.CaseVariants.getCaseVariants(CaseVariants.java:92)
	- waiting to lock <0x0000000264bf1c00> (a java.lang.Class for net.sf.saxon.regex.CaseVariants)
	at net.sf.saxon.regex.REMatcher.equalCaseBlind(REMatcher.java:715)
	at net.sf.saxon.regex.Operation$OpAtom.iterateMatches(Operation.java:631)
	at net.sf.saxon.regex.REMatcher.checkPreconditions(REMatcher.java:494)
	at net.sf.saxon.regex.REMatcher.match(REMatcher.java:426)
	at net.sf.saxon.regex.ARegularExpression.containsMatch(ARegularExpression.java:89)
	at net.sf.saxon.functions.Matches.call(Matches.java:78)
	at net.sf.saxon.functions.Matches.call(Matches.java:24)
	at net.sf.saxon.expr.FunctionCall.iterate(FunctionCall.java:532)
	at net.sf.saxon.expr.Expression.effectiveBooleanValue(Expression.java:889)
	at net.sf.saxon.expr.FilterIterator$NonNumeric.matches(FilterIterator.java:190)
	at net.sf.saxon.expr.FilterIterator.getNextMatchingItem(FilterIterator.java:76)
	at net.sf.saxon.expr.FilterIterator.next(FilterIterator.java:62)
	at net.sf.saxon.om.SequenceIterator.forEachOrFail(SequenceIterator.java:127)
	at net.sf.saxon.expr.sort.DocumentOrderIterator.<init>(DocumentOrderIterator.java:51)
	at net.sf.saxon.expr.sort.DocumentSorter.iterate(DocumentSorter.java:260)
	at net.sf.saxon.expr.parser.Evaluator$7.evaluate(Evaluator.java:239)
	at net.sf.saxon.expr.LetExpression.eval(LetExpression.java:532)
	at net.sf.saxon.expr.LetExpression.processLeavingTail(LetExpression.java:714)
	at net.sf.saxon.expr.instruct.Choose.processLeavingTail(Choose.java:896)
	at net.sf.saxon.expr.LetExpression.processLeavingTail(LetExpression.java:723)
	at net.sf.saxon.expr.instruct.TemplateRule.applyLeavingTail(TemplateRule.java:352)
	at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:533)
	at net.sf.saxon.expr.instruct.ApplyTemplates.apply(ApplyTemplates.java:300)
	at net.sf.saxon.expr.instruct.ApplyTemplates.processLeavingTail(ApplyTemplates.java:255)
	at net.sf.saxon.expr.instruct.Block.processLeavingTail(Block.java:735)
	at net.sf.saxon.expr.instruct.Instruction.process(Instruction.java:132)
	at net.sf.saxon.expr.instruct.ElementCreator.processLeavingTail(ElementCreator.java:353)
	at net.sf.saxon.expr.instruct.Copy.processLeavingTail(Copy.java:431)
	at net.sf.saxon.expr.instruct.TemplateRule.applyLeavingTail(TemplateRule.java:352)
	at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:533)
	at net.sf.saxon.trans.rules.TextOnlyCopyRuleSet.process(TextOnlyCopyRuleSet.java:71)
	at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:495)
	at net.sf.saxon.trans.XsltController.applyTemplates(XsltController.java:746)
	at net.sf.saxon.s9api.AbstractXsltTransformer.applyTemplatesToSource(AbstractXsltTransformer.java:347)
	at net.sf.saxon.s9api.XsltTransformer.transform(XsltTransformer.java:349)
	at net.sf.saxon.jaxp.TransformerImpl.transform(TransformerImpl.java:71)
	at net.sf.saxon.jaxp.TransformerHandlerImpl.endDocument(TransformerHandlerImpl.java:173)

The following class is both a test case to demonstrate the problem and a modified version of CaseVariants that fixes the problem (changes marked FIX:). It compares the time taken to repeatedly call getCaseVariants on both versions of the class. Running this main method results in the following output on my pc (with 12cores/24threads).

Round 1
CaseVariants completed in 1314ms
CaseVariantsModified completed in 40ms
Round 2
CaseVariants completed in 1875ms
CaseVariantsModified completed in 10ms

The synchronized method in CaseVariants is a huge performance hit. In CaseVariantsModified I have changed it to use double checked locking to avoid the synchronisation for all but the first few calls.

It would be greatly appreciated if you could patch saxon to fix this issue. Would you also be able to back port the fix into v9?

package test;

import net.sf.saxon.Configuration;
import net.sf.saxon.lib.ParseOptions;
import net.sf.saxon.lib.Validation;
import net.sf.saxon.om.AxisInfo;
import net.sf.saxon.om.NodeInfo;
import net.sf.saxon.pattern.NameTest;
import net.sf.saxon.regex.CaseVariants;
import net.sf.saxon.trans.XPathException;
import net.sf.saxon.tree.iter.AxisIterator;
import net.sf.saxon.type.Type;
import net.sf.saxon.z.IntArraySet;
import net.sf.saxon.z.IntHashMap;
import net.sf.saxon.z.IntToIntHashMap;
import net.sf.saxon.z.IntToIntMap;

import java.io.InputStream;
import java.time.Duration;
import java.time.Instant;
import java.util.ArrayList;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.function.IntFunction;
import javax.xml.transform.stream.StreamSource;

public class TestCaseVariants {
    public static final int THREADS = 24;
    public static final int ITERATIONS = 1_000;

    public static void main(String[] args) throws InterruptedException {

        // Ensure all the maps have been built, so we are only testing the performance
        // of getting variants
        CaseVariants.getCaseVariants('a');
        CaseVariantsModified.getCaseVariants('a');

        final ExecutorService executorService = Executors.newFixedThreadPool(THREADS);

        for (int round = 1; round <= 2; round++) {
            System.out.println("Round " + round);
            runTest("CaseVariants", executorService, CaseVariants::getCaseVariants);
            runTest("CaseVariantsModified", executorService, CaseVariantsModified::getCaseVariants);
        }
        executorService.shutdown();
    }

    private static void runTest(final String name,
                                final ExecutorService executorService,
                                final IntFunction<int[]> func) throws InterruptedException {

        final CountDownLatch startAllLatch = new CountDownLatch(1);
        final CountDownLatch threadReadyLatch = new CountDownLatch(THREADS);
        final CountDownLatch completionLatch = new CountDownLatch(THREADS);

        for (int k = 0; k < THREADS; k++) {
            executorService.submit(() -> {
                threadReadyLatch.countDown();
                try {
                    // All threads start together
                    startAllLatch.await();
                } catch (InterruptedException e) {
                    throw new RuntimeException(e);
                }
                for (long j = 0; j < ITERATIONS; j++) {
                    for (int i = 0; i < 1000; i++) {
                        func.apply(i);
                    }
                }
                completionLatch.countDown();
            });
        }
        final Instant startTime = Instant.now();
        // Wait for all threads to be ready
        threadReadyLatch.await();
        // Release all threads
        startAllLatch.countDown();
        // Wait for all threads to complete
        completionLatch.await();
        final Duration duration = Duration.between(startTime, Instant.now());
        System.out.println(name + " completed in " + duration.toMillis() + "ms");
    }

    // --------------------------------------------------------------------------------


    ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
    // Copyright (c) 2018 Saxonica Limited.
    // This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0.
    // If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
    // This Source Code Form is "Incompatible With Secondary Licenses", as defined by the Mozilla Public License, v. 2.0.
    ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
    /**
     * Modified copy of {@link net.sf.saxon.regex.CaseVariants}.
     */
    private static class CaseVariantsModified {

        // Use one hashmap for characters with a single case variant, another for characters with multiple
        // case variants, to reduce the number of objects that need to be allocated
        // FIX: Make these volatile
        private static volatile IntToIntMap monoVariants = null;
        private static volatile IntHashMap<int[]> polyVariants = null;

        static void build() {

            // FIX: Use temporary local variables
            final IntToIntMap monoVariants = new IntToIntHashMap(2500);
            final IntHashMap<int[]> polyVariants = new IntHashMap<>(100);

            InputStream in = Configuration.locateResource("casevariants.xml", new ArrayList<>(), new ArrayList<>());
            if (in == null) {
                throw new RuntimeException("Unable to read casevariants.xml file");
            }

            Configuration config = new Configuration();
            ParseOptions options = new ParseOptions();
            options.setSchemaValidationMode(Validation.SKIP);
            options.setDTDValidationMode(Validation.SKIP);
            NodeInfo doc;
            try {
                doc = config.buildDocumentTree(new StreamSource(in, "casevariants.xml"), options).getRootNode();
            } catch (XPathException e) {
                throw new RuntimeException("Failed to build casevariants.xml", e);
            }

            AxisIterator iter = doc.iterateAxis(AxisInfo.DESCENDANT, new NameTest(Type.ELEMENT, "", "c", config.getNamePool()));
            while (true) {
                NodeInfo item = iter.next();
                if (item == null) {
                    break;
                }
                String code = item.getAttributeValue("", "n");
                int icode = Integer.parseInt(code, 16);
                String variants = item.getAttributeValue("", "v");
                String[] vhex = variants.split(",");
                int[] vint = new int[vhex.length];
                for (int i=0; i<vhex.length; i++) {
                    vint[i] = Integer.parseInt(vhex[i], 16);
                }
                if (vhex.length == 1) {
                    monoVariants.put(icode, vint[0]);
                } else {
                    polyVariants.put(icode, vint);
                }
            }
            // FIX: Set static class variables at the end
            CaseVariantsModified.polyVariants = polyVariants;
            // Must set this last as other threads will be checking this value
            CaseVariantsModified.monoVariants = monoVariants;
        }

        /**
         * Get the case variants of a character
         *
         * @param code the character whose case variants are required
         * @return the case variants of the character, excluding the character itself
         */

        // FIX: Method no longer synchronized
        public static int[] getCaseVariants(int code) {
            // FIX: Use local variables to avoid additional volatile reads
            IntToIntMap monoVariants = CaseVariantsModified.monoVariants;
            if (monoVariants == null) {
                synchronized (CaseVariantsModified.class) {
                    monoVariants = CaseVariantsModified.monoVariants;
                    if (monoVariants == null) {
                        build();
                        monoVariants = CaseVariantsModified.monoVariants;
                    }
                }
            }
            int mono = monoVariants.get(code);
            if (mono != monoVariants.getDefaultValue()) {
                return new int[]{mono};
            } else {
                int[] result = polyVariants.get(code);
                if (result == null) {
                    return IntArraySet.EMPTY_INT_ARRAY;
                } else {
                    return result;
                }
            }
        }

        /**
         * Get the case variants of roman letters (A-Z, a-z), other than the letters A-Z and a-z themselves
         */

        /*@NotNull*/ public static int[] ROMAN_VARIANTS = {0x0130, 0x0131, 0x212A, 0x017F};
    }
}


Files

CaseVariants.png (95.2 KB) CaseVariants.png VisualVM at055612 at055612, 2023-10-18 18:33
Actions #1

Updated by Michael Kay about 1 year ago

Thanks for reporting this.

I've often discounted reports based on observing that many threads are "blocked", since this often doesn't mean that the thread can't proceed, it means that the scheduler has taken the opportunity to let a different thread take control. If you see many threads in this state it may mean that you have configured the system to run with more threads than is optimal to maximise throughput and minimise response time.

But the fact that you've measured a throughput benefit by changing this does suggest it's something we should consider changing.

For a performance improvement we wouldn't normally issue a patch for earlier versions of the software. And certainly not for 9.9.1.8 which is now well out of its maintenance period.

I would add that we're very happy to do our best to support users of the open source product, especially when they make an effort like you have done to investigate the problem, which benefits the user community generally. However, when it comes down to competing priorities, we do like to ensure that paying customers get first place in the queue.

We probably also need to look at other data used by the regex engine that's initialised on first use.

Actions #2

Updated by Michael Kay about 1 year ago

My results from running your code as supplied:

Round 1
CaseVariants completed in 977ms
CaseVariantsModified completed in 185ms
Round 2
CaseVariants completed in 1031ms
CaseVariantsModified completed in 22ms

which is certainly a huge difference. However, it's not a realistic benchmark because the only thing this code is doing is to repeatedly call the synchronized method, which means it blocks every time. I don't think that would happen for many real workloads. Just as an experiment I added into the loop after each call on CaseVariants an XPath evaluation of matches('ABC', 'abc', 'i') and this changes the numbers to

CaseVariants completed in 11315ms
CaseVariantsModified completed in 10516ms

which reduces the overhead to 8%; and in any kind of real workload I would expect it to be a lot lower than that.

I also tried (purely out of curiosity changing the code so it reads as follows (it's now always executing the synchronised code, but the synchronisation block is smaller):

                IntToIntMap monoVariants = CaseVariantsModified.monoVariants;
                //if (monoVariants == null) {
                    synchronized(CaseVariantsModified.class) {
                        monoVariants = CaseVariantsModified.monoVariants;
                        if (monoVariants == null) {
                            build();
                            monoVariants = CaseVariantsModified.monoVariants;
                        }
                    }
                //}
                int mono = monoVariants.get(code);

and this gave

CaseVariants completed in 11172ms
CaseVariantsModified completed in 10876ms

Since we've got the test case to play with, I'm going to do another experiment using the "initialization on demand" pattern described at https://en.wikipedia.org/wiki/Initialization-on-demand_holder_idiom and see how that compares.

Actions #3

Updated by Michael Kay about 1 year ago

The initialization-on-demand approach appears to be essentially indistinguishable.

Including the call on matches():

Round 5
CaseVariants completed in 11698ms
CaseVariantsModified completed in 10745ms
CaseVariantsOnDemand completed in 10784ms
``

I think I might go with that one because it's conceptually cleaner.
Actions #4

Updated by at055612 at055612 about 1 year ago

Thanks for the quick replies. Fully appreciate that we are using the free version and the priority on the paid versions. I didn't think a back port to v9 was an option but it was worth asking. Upgrading saxon to a new major release version carries a lot of risk given the high volume and varied nature of the data we are processing, so we can't do it lightly. We can work around this for now.

I had forgotten about the Initialization-on-demand holder idiom. Agree, that is a much neater solution.

A similar initialisation problem exists in UnicodeBlocks. That is susceptible to a subtle race condition (which I am in the process of raising a bug ticket for) so should also be changed to use the IoD holder idiom in the same way as CaseVariants.

Agree that the test I wrote is not a realistic use case but we have experienced significant performance degradation in two different prod environments when using the i flag. Removing the i flag and changing the regex to achieve the same effective result returned performance to normal. It is possible that there are also other issues in the case insensitive code path. The way synchronized was used in CaseVariants however was a definite hit on performance, no matter how small/large, for no functional benefit.

What cpu are you running the tests with, as the problem should be enhanced with increased numbers of cpu threads/cores?

Actions #5

Updated by Michael Kay about 1 year ago

My figures were on a Mac M1 Pro, 32Gb, MacOS 13.5.2. That claims "Total Number of Cores: 10 (8 performance and 2 efficiency)" which I confess have never fully understood.

Actions #6

Updated by at055612 at055612 about 1 year ago

I ran another little test. This time to compare containsMatch() with and without the i flag.

Results:

Round 1
insensitive:  completed in 7340ms
sensitive  :  completed in 780ms
Round 2
insensitive:  completed in 7273ms
sensitive  :  completed in 730ms

Case insensitive is x10 slower. I attached visualVM to see what was going on and all time appears to be spent in getCaseVariants(). See attached screen grab. Also the thread dump contains lots of these:

"pool-1-thread-6" #28 prio=5 os_prio=0 cpu=29227.68ms elapsed=199.29s tid=0x00007ff9a4ba3b90 nid=0xb0c5 waiting for monitor entry  [0x00007ff8b7afe000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at net.sf.saxon.regex.CaseVariants.getCaseVariants(CaseVariants.java:92)
        - waiting to lock <0x0000000414c5f140> (a java.lang.Class for net.sf.saxon.regex.CaseVariants)
        at net.sf.saxon.regex.REMatcher.equalCaseBlind(REMatcher.java:715)
        at net.sf.saxon.regex.REMatcher.match(REMatcher.java:446)
        at net.sf.saxon.regex.ARegularExpression.containsMatch(ARegularExpression.java:89)
        at stroom.pipeline.xslt.TestRegex.lambda$runTest$0(TestRegex.java:58)

This seems to suggest the locking is the issue. Even though the work inside the synchronised method is minimal, all 24 threadsHowever, it would be useful if you could run this test as is, then run it again but with a modified version of CaseVariants that uses the IoD holder idiom. This should hopefully prove the fix.

package stroom.pipeline.xslt;

import net.sf.saxon.Configuration;
import net.sf.saxon.regex.ARegularExpression;
import net.sf.saxon.regex.CaseVariants;
import net.sf.saxon.trans.XPathException;

import java.time.Duration;
import java.time.Instant;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class TestRegex {

    public static final int THREADS = 24;
    public static final int ITERATIONS = 1_000_000;

    public static void main(String[] args) throws XPathException, InterruptedException {

        final Configuration configuration = new Configuration();
        // Init the case variants state
        CaseVariants.getCaseVariants('a');
        final ExecutorService executorService = Executors.newFixedThreadPool(THREADS);

        for (int round = 1; round <= 2; round++) {
            System.out.println("Round " + round);

            runTest("insensitive: ", "i", executorService, configuration);
            runTest("sensitive  : ", "", executorService, configuration);
        }
        executorService.shutdown();
    }

    private static void runTest(
            final String name,
            final String flags,
            final ExecutorService executorService,
            final Configuration configuration) throws InterruptedException {

        final CountDownLatch startAllLatch = new CountDownLatch(1);
        final CountDownLatch threadReadyLatch = new CountDownLatch(THREADS);
        final CountDownLatch completionLatch = new CountDownLatch(THREADS);

        // Assuming saxon compiles only once for each pattern in an xslt.
        final ARegularExpression aRegularExpression = compileRegex(flags, configuration);

        for (int k = 0; k < THREADS; k++) {
            executorService.submit(() -> {
                threadReadyLatch.countDown();
                try {
                    // All threads start together
                    startAllLatch.await();
                } catch (InterruptedException e) {
                    throw new RuntimeException(e);
                }
                for (long j = 0; j < ITERATIONS; j++) {
                    if (!aRegularExpression.containsMatch("ZZZ abc ZZZ")) {
                        throw new RuntimeException("No match");
                    }
                }
                completionLatch.countDown();
            });
        }
        final Instant startTime = Instant.now();
        // Wait for all threads to be ready
        threadReadyLatch.await();
        // Release all threads
        startAllLatch.countDown();
        // Wait for all threads to complete
        completionLatch.await();
        final Duration duration = Duration.between(startTime, Instant.now());
        System.out.println(name + " completed in " + duration.toMillis() + "ms");
    }

    private static ARegularExpression compileRegex(final String flags,
                                                   final Configuration configuration) {
        final ARegularExpression aRegularExpression;
        try {
            aRegularExpression = new ARegularExpression(
                    "abc",
                    flags,
                    "XP20",
                    null,
                    configuration);
        } catch (XPathException e) {
            throw new RuntimeException(e);
        }
        return aRegularExpression;
    }
}

Actions #7

Updated by at055612 at055612 about 1 year ago

Related to this issue is https://saxonica.plan.io/issues/6226

Actions #8

Updated by Michael Kay about 1 year ago

  • Applies to branch 10, 11, 12, trunk added
  • Fix Committed on Branch 11, 12 added
  • Platforms .NET, Java added

Resolved as described for the 11.x, 12.x and main branches.

Actions #9

Updated by Michael Kay about 1 year ago

  • Status changed from New to Resolved
Actions #10

Updated by O'Neil Delpratt 12 months ago

  • Fixed in Maintenance Release 12.4 added

Bug fix applied in the Saxon 12.4 Maintenance release. Leaving it marked as 'Resolved' until fix applied on Saxon 11.

Please register to edit this issue

Also available in: Atom PDF