Project

Profile

Help

Patch #6540

closed

NamePool - Concurrency/Locking behavior improvements

Added by Cristian Vat 2 months ago. Updated about 2 months ago.

Status:
Closed
Priority:
Low
Assignee:
-
Category:
Performance
Sprint/Milestone:
-
Start date:
2024-09-19
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
12, trunk
Fix Committed on Branch:
trunk
Fixed in Maintenance Release:
Platforms:
.NET, Java

Description

Context:

  • Java 21, Eclipse Temurin/Adoptium
  • SaxonJ 12.5 HE, JAXP/Saxon API, some custom extension functions
  • lots of big documents without any namespaces
  • web application doing various transformations for each request (concurrency depends on users)
  • several transformations using StAXSource
    • using Woodstox 6.7 for StAX
    • using StAX since we actually have some javax.xml.stream.StreamFilters removing elements not needed for some transformations reducing size of TinyTree and speeding up transformation

Locking behavior

I saw net.sf.saxon.om.NamePool#allocateFingerprint on and off in the past years appearing in different profiles especially around lock usage, now finally dug deeper into it and made a patch trying to improve behavior.

Ran lock profiling using async profiler ( https://github.com/async-profiler/async-profiler ) like:

asprof -d 60 -e lock -f locksprof.html PID

Lock stacks around NamePool mostly look like this:

net.sf.saxon.om.NamePool
net/sf/saxon/om/NamePool.allocateFingerprint
net/sf/saxon/om/FingerprintedQName.<init>
net/sf/saxon/pull/StaxBridge.next
net/sf/saxon/pull/PullFilter.next
net/sf/saxon/pull/PullPushTee.next
net/sf/saxon/pull/PullConsumer.consume
net/sf/saxon/pull/PullPushCopier.copy
net/sf/saxon/pull/PullSource.deliver
net/sf/saxon/pull/ActiveStAXSource.deliver
net/sf/saxon/event/Sender.send
net/sf/saxon/Controller.makeSourceTree
net/sf/saxon/s9api/XsltTransformer.transform
net/sf/saxon/jaxp/TransformerImpl.transform

Actual performance impact is a bit hard to measure since it depends on concurrency and there are some timing variations from other application code. So far I ran tests with 7 threads doing continuous requests calling the same transformation on same large document but I'll also try to set up a bigger test with a lot more threads.

I guess one of the issues is also from https://saxonica.plan.io/issues/3366 if that's still valid ?:

Note: on the push() path, the ReceivingContentHandler maintains a local cache of allocated name codes, but there is no equivalent on the pull path.

Another specialty is lack of namespaces so it's going through net.sf.saxon.om.NoNamespaceName#obtainFingerprint which calls allocateFingerprint with NamespaceUri.NULL uri.

All measurements/observations are after stylesheets were compiled and a few transformations already ran on same documents, so the NamePool already contains all names that will be seen.

Patch summary/performance

Attached patch NamePool_locking_behavior_improvements.patch that I'm testing now in my environment.

Hard to measure exact performance impact, I'll try to see if I manage to test in production with a lot of requests but so far results from a test environment being called continuously by 7 threads doing transformations:

  • NamePool completely disappeared from async profiler lock profiling traces
    • funnily enough now I see Expression.getSlotsUsed on FilterExpression in lock monitoring...it's also synchronized, maybe something for another time...
  • ran async profiler in CPU profiling mode (asprof -d 60 -e cpu -f cpuflame.html PID before/after and there FingerprintedQName.<init> went from 3.9% to 1.3% of total traces going through TransformerImpl.transform
  • no significant differences in total transformation time (might be same 1-2% but hard to measure reliably at that point, maybe more visible when I can bump up the threads a lot more in later tests)

Patch details

Removed synchronized from allocateFingerprint method and moved to a synchronized block. The reserved/Saxon/StandardNames checks do not need synchronization since they're simple checks or StandardNames where the lookup map is pre-populated in a static block. From a memory point of view there's an extra allocation of StructuredQName since that's now outside synchronized block but that's just a wrapper on existing parameters doesn't seem to allocate anything else in construction.

Added fingerprintGuard as a private guard object on which to lock instead of the NamePool instance. NamePool can be retrieved from Configuration or other places so this avoids external code accidentally locking on it and causing other problems.

There are various places in Saxon where allocateFingerprint is called with NamespaceUri.NULL as parameter and while NamespaceUri#isReserved has a check against null uri it doesn't check against NamespaceUri.NULL so I added it as condition to the if block.

Duplicated the initial qNameToInteger.getOrDefault also inside the synchronized block. In the happy path if the fingerprint is already allocated for the qName this completely avoids any locking (as per ConcurrentHashMap javadoc retrieval operations do not entail locking). If a new fingerprint needs to be allocated there is an extra concurrent map get but that seems like it could be worth it. If new fingerprint creation is a hot path it's probably going to hit max fingerprint.


Files

Please register to edit this issue

Also available in: Atom PDF