Project

Profile

Help

Patch #6540

open

NamePool - Concurrency/Locking behavior improvements

Added by Cristian Vat 8 days ago. Updated 6 days ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
Performance
Sprint/Milestone:
-
Start date:
2024-09-19
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
12
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:
Java

Description

Context:

  • Java 21, Eclipse Temurin/Adoptium
  • SaxonJ 12.5 HE, JAXP/Saxon API, some custom extension functions
  • lots of big documents without any namespaces
  • web application doing various transformations for each request (concurrency depends on users)
  • several transformations using StAXSource
    • using Woodstox 6.7 for StAX
    • using StAX since we actually have some javax.xml.stream.StreamFilters removing elements not needed for some transformations reducing size of TinyTree and speeding up transformation

Locking behavior

I saw net.sf.saxon.om.NamePool#allocateFingerprint on and off in the past years appearing in different profiles especially around lock usage, now finally dug deeper into it and made a patch trying to improve behavior.

Ran lock profiling using async profiler ( https://github.com/async-profiler/async-profiler ) like:

asprof -d 60 -e lock -f locksprof.html PID

Lock stacks around NamePool mostly look like this:

net.sf.saxon.om.NamePool
net/sf/saxon/om/NamePool.allocateFingerprint
net/sf/saxon/om/FingerprintedQName.<init>
net/sf/saxon/pull/StaxBridge.next
net/sf/saxon/pull/PullFilter.next
net/sf/saxon/pull/PullPushTee.next
net/sf/saxon/pull/PullConsumer.consume
net/sf/saxon/pull/PullPushCopier.copy
net/sf/saxon/pull/PullSource.deliver
net/sf/saxon/pull/ActiveStAXSource.deliver
net/sf/saxon/event/Sender.send
net/sf/saxon/Controller.makeSourceTree
net/sf/saxon/s9api/XsltTransformer.transform
net/sf/saxon/jaxp/TransformerImpl.transform

Actual performance impact is a bit hard to measure since it depends on concurrency and there are some timing variations from other application code. So far I ran tests with 7 threads doing continuous requests calling the same transformation on same large document but I'll also try to set up a bigger test with a lot more threads.

I guess one of the issues is also from https://saxonica.plan.io/issues/3366 if that's still valid ?:

Note: on the push() path, the ReceivingContentHandler maintains a local cache of allocated name codes, but there is no equivalent on the pull path.

Another specialty is lack of namespaces so it's going through net.sf.saxon.om.NoNamespaceName#obtainFingerprint which calls allocateFingerprint with NamespaceUri.NULL uri.

All measurements/observations are after stylesheets were compiled and a few transformations already ran on same documents, so the NamePool already contains all names that will be seen.

Patch summary/performance

Attached patch NamePool_locking_behavior_improvements.patch that I'm testing now in my environment.

Hard to measure exact performance impact, I'll try to see if I manage to test in production with a lot of requests but so far results from a test environment being called continuously by 7 threads doing transformations:

  • NamePool completely disappeared from async profiler lock profiling traces
    • funnily enough now I see Expression.getSlotsUsed on FilterExpression in lock monitoring...it's also synchronized, maybe something for another time...
  • ran async profiler in CPU profiling mode (asprof -d 60 -e cpu -f cpuflame.html PID before/after and there FingerprintedQName.<init> went from 3.9% to 1.3% of total traces going through TransformerImpl.transform
  • no significant differences in total transformation time (might be same 1-2% but hard to measure reliably at that point, maybe more visible when I can bump up the threads a lot more in later tests)

Patch details

Removed synchronized from allocateFingerprint method and moved to a synchronized block. The reserved/Saxon/StandardNames checks do not need synchronization since they're simple checks or StandardNames where the lookup map is pre-populated in a static block. From a memory point of view there's an extra allocation of StructuredQName since that's now outside synchronized block but that's just a wrapper on existing parameters doesn't seem to allocate anything else in construction.

Added fingerprintGuard as a private guard object on which to lock instead of the NamePool instance. NamePool can be retrieved from Configuration or other places so this avoids external code accidentally locking on it and causing other problems.

There are various places in Saxon where allocateFingerprint is called with NamespaceUri.NULL as parameter and while NamespaceUri#isReserved has a check against null uri it doesn't check against NamespaceUri.NULL so I added it as condition to the if block.

Duplicated the initial qNameToInteger.getOrDefault also inside the synchronized block. In the happy path if the fingerprint is already allocated for the qName this completely avoids any locking (as per ConcurrentHashMap javadoc retrieval operations do not entail locking). If a new fingerprint needs to be allocated there is an extra concurrent map get but that seems like it could be worth it. If new fingerprint creation is a hot path it's probably going to hit max fingerprint.


Files

Actions #1

Updated by Michael Kay 7 days ago

Thanks for your work on this which we will study carefully.

Since the redesign of the NamePool in 9.7 to use ConcurrentHashMap we have had no reports of locking problems on the NamePool, but I think the distinguishing characteristic of your test workload is the use of a Pull parser and StaxBridge. From a quick look it seems to me that although there is some caching of element names, there is no caching of attribute names and this seems to be where you are getting the contention. So my first thought would be that we should improve the name caching in StaxBridge to reduce the number of calls on the NamePool.

I haven't studied your proposed changes to the NamePool locking strategy but it's an area where I'm quite hesitant to make changes because of the difficulty of ensuring that it's truly thread-safe. We've been badly bitten by such effects in the past, and it's very difficult to devise really rigorous tests.

Please note that the terms and conditions for this site say that you give us a license to use any code that you post here, but to avoid any doubt (some of our customers are quite picky about this!) I would be grateful if you would confirm this explicitly.

Actions #2

Updated by Cristian Vat 7 days ago

Yes I saw the note on the front page.

I hereby explicitly give Saxonica Ltd license to use the code attached in any manner deemed necessary including but not limited to usages in test suite or using the code in future versions of the product.

Regarding performance/locking: yeah it's hard to measure. It only became more visible now because I was running a stress-test sending 2-3X normal amount of traffic to a single instance. Reviewed other usages in my code-base and I did encounter it before in document processing pipeline where I'm just throwing 20-24 threads at processing (there I see a year ago I've moved from single Templates to a ThreadLocal to speed it up). I'll see if I can write a JMH benchmark that only directly calls allocateFingerprint to avoid variations from other code.

Regarding correctness/bugs: hard to prove definitively but I tried to make the changes minimal. Everything that changes state is still in a proper synchronized block, everything outside should not change any state (there's StandardNames.getFingerprint which accesses a HashMap but it's fully populated during class loading and then only used for reads, so it should be safe).

I had an idea about using .computeIfAbsent on qNameToInteger but I gave up on it since:

  • ConcurrentMap interface javadocs don't say anything explicit
  • ConcurrentHashMap implementation javadocs explicitly say it blocks and 'function is applied at most once per key'
  • I'm not sure what the C# transpilation would generate or what the behavior there is, whereas synchronized blocks are more standard

Trying to think of a real test for correctness this is the best I could come up with now:

  • create a NamePool
  • have an array of X uri+local pairs
  • create N threads, start them, have each thread shuffle an array of ui+local pairs then wait on a countdown latch
  • release the latch so all threads call allocateFingerprint for some amount of iterations on their randomly ordered arrays
  • wait for threads to finish and call allocateFingerprint with a new unseen name
    • it should have a fingerprint 1024 + X (reserved+X pairs in our test array)

Then we could run this on:

  • current NamePool
  • NamePool with just removing 'synchronized' on method testing if it fails
  • NamePool with my patch applied

(That would be similar to what I can see for #3159, #6225 and #6076, just looked for existing tests using threads and CountDownLatch. I see there's also nativetests.ThreadedNamePoolTest but that mentions scalability and I can't figure out if it also checks correctness... )

Does that sound like a good test or do you have any other recommendations?

Actions #3

Updated by Michael Kay 7 days ago

The main change you made to restrict the scope of the locking in allocateFingerprint() certainly looks safe and I'll probably incorporate it in some form.

I think the much more significant thing you've identified is the need to fix name caching in StaxBridge. I'm also conscious that we don't do much testing of the pull parser paths in SaxonJ - though the code is heavily exercised in SaxonCS because we're always using a pull parser in that case.

Actions #4

Updated by Cristian Vat 6 days ago

Managed to create a test inspired from UnsafeXsltBug3159:

public class NamePoolCorrectness6540 {

    @Test
    public void testNamePoolThreaded() throws InterruptedException {

        NamePool namePool = new NamePool();

        String[] localNames = new String[]{"one", "two", "three", "four", "five", "six", "seven", "eight", "nine"};
        final int threadSize = 30;
        final CountDownLatch latch = new CountDownLatch(threadSize);
        final ThreadFactory threadFactory = Executors.defaultThreadFactory();
        final Thread[] threads = new Thread[threadSize];
        for (int i = 0; i < threadSize; ++i) {
            threads[i] = threadFactory.newThread(new NamePoolWorker(namePool, localNames, latch));
        }
        for (int i = 0; i < threadSize; ++i) {
            threads[i].start();
        }
        for (int i = 0; i < threadSize; ++i) {
            threads[i].join();
        }
        int lastFingerprint = namePool.allocateFingerprint(NamespaceUri.NULL, "unseen-name");
        Assertions.assertEquals(1024 + localNames.length, lastFingerprint, "Missing or duplicated NamePool fingerprint");
    }

    private static class NamePoolWorker implements Runnable {

        private final NamePool namePool;
        private final CountDownLatch latch;
        private final String[] localNamesShuffled;

        public NamePoolWorker(NamePool namePool, String[] localNames, CountDownLatch latch) {
            this.namePool = namePool;
            List<String> names = Arrays.asList(localNames);
            Collections.shuffle(names);
            this.localNamesShuffled = names.toArray(new String[0]);
            this.latch = latch;
        }

        @Override
        public void run() {
            try {
                latch.countDown();
                latch.await();

                for (int i = 0; i < 1000; i++) {
                    for (String localName : localNamesShuffled) {
                        namePool.allocateFingerprint(NamespaceUri.NULL, localName);
                    }
                }
            } catch (Throwable t) {
                t.printStackTrace(System.out);
            }
        }
    }

}

Ran it (in my own code since I had problems directly in Saxon due to EE replacements... ) against current NamePool, patched NamePool and current NamePool with synchronized removed from method declaration.

Ran about a dozen times. Both current and patched NamePool runs were fine. NamePool with synchronized removed consistently failed to my surprise. Was expecting at least some runs to just randomly succeed.

Assertion failures were not the same values each time, from some runs:

org.opentest4j.AssertionFailedError: Missing or duplicated NamePool fingerprint ==> expected: <1033> but was: <1035>
org.opentest4j.AssertionFailedError: Missing or duplicated NamePool fingerprint ==> expected: <1033> but was: <1035>
org.opentest4j.AssertionFailedError: Missing or duplicated NamePool fingerprint ==> expected: <1033> but was: <1040>
org.opentest4j.AssertionFailedError: Missing or duplicated NamePool fingerprint ==> expected: <1033> but was: <1038>
org.opentest4j.AssertionFailedError: Missing or duplicated NamePool fingerprint ==> expected: <1033> but was: <1034>

In one run I put a breakpoint on the assertion and inspected how NamePool fingerprints looked internally:

result = {NamePool@3211} 
 qNameToInteger = {ConcurrentHashMap@3233}  size = 10
  {StructuredQName@3249} "nine" -> {Integer@3250} 1027
  {StructuredQName@3251} "three" -> {Integer@3252} 1031
  {StructuredQName@3253} "two" -> {Integer@3254} 1026
  {StructuredQName@3255} "four" -> {Integer@3256} 1035
  {StructuredQName@3257} "eight" -> {Integer@3258} 1030
  {StructuredQName@3259} "six" -> {Integer@3260} 1025
  {StructuredQName@3261} "unseen-name" -> {Integer@3262} 1039
  {StructuredQName@3263} "seven" -> {Integer@3264} 1024
  {StructuredQName@3265} "five" -> {Integer@3266} 1028
  {StructuredQName@3267} "one" -> {Integer@3268} 1036

1024,1025,1026,1027,1028 then just skipping over to 1030,1031, skipping again to 1035,1036 and then 1039...

Actions #5

Updated by Michael Kay 6 days ago

I have updated both StaxBridge in SaxonJ and the equivalent DotNetPullProvider in SaxonCS to maintain a local name cache for the duration of parsing, in the same way that ReceivingContentHandler does for a push parser. This should greatly reduce the number of calls on the allocateFingerprint method in pull-parser workloads.

though the code is heavily exercised in SaxonCS because we're always using a pull parser in that case.

No, in fact SaxonJ and SaxonCS have different code paths here, though they are very similar.

Actions #6

Updated by Michael Kay 6 days ago

Can I ask what the rationale is for synchronizing on a separate object rather than on the NamePool itself?

Actions #7

Updated by Cristian Vat 6 days ago

The NamePool instance itself can be accessed externally such as via NamePool namePool = configuration.getNamePool()

If some external non-Saxon code gets the reference and does a synchronized(namePool) somewhere (accidentally even) it could completely block everything since it locks on the same object. By using a private guard object to lock on it avoids this and makes sure no external code can influence internal NamePool locking behavior.

Please register to edit this issue

Also available in: Atom PDF