Project

Profile

Help

Bug #3011

closed

Perfomance of shallow copy with namespaces

Added by Michael Kay over 7 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Performance
Sprint/Milestone:
-
Start date:
2016-10-28
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
9.7
Fix Committed on Branch:
trunk
Fixed in Maintenance Release:
Platforms:

Description

This problem from Jorge Williams was unfortunately tagged on to the end of the unrelated issue #2747, I am belatedly giving it its own entry as it has no connection with 2747 other than the fact that both are about performance.

The test case has a source document with hundreds of namespace declarations, and the processing style makes heavy use of shallow copy (xsl:copy). When xsl:copy is applied to an element, by default all the in-scope namespaces are copied. This searches the ancestors of the target node, copying each namespace down the pipeline individually by a call on Receiver.namespace(). Eventually these namespaces hit a NamespaceReducer, whose task is to eliminate redundant namespace declarations, and it finds that nearly all the namespaces are redundant because they have already been declared on the parent element.

For this particular test case, 680 million namespace bindings are being processed by the ComplexContentOutputter, and it is taking nearly 7 minutes; by adding copy-namespaces=no to the xsl:copy instructions, this is reduced to 9 seconds. The performance got substantially worse between 9.3 and 9.4 when integer namespace codes were removed (this was done to reduce contention on the NamePool, and resulted in no performance regression for less unusuall scenarios).

Actions #1

Updated by Michael Kay over 7 years ago

Fixing this will require a design change and it won't be a quick fix. But I think there might be a solution along the following lines.

Firstly, introduce a class such as NamespaceBindingSet representing an immutable set of namespace bindings (possibly a map from prefixes to URIs, but probably not implemented as a general-purpose map). There could be multiple implementations, and the existing NamespaceBinding class could be used to represent a singleton set.

Change the Receiver interface so that the namespace() method accepts a NamespaceBindingSet rather than a NamespaceBinding - that is, it allows multiple namespaces to be passed with a single call.

Change the NodeInfo interface so that instead of (or perhaps as well as) the current getDeclaredNamespaces() (which returns namespace declarations and undeclarations on one particular element), it returns a NamespaceBindingSet containing all the in-scope namespaces for one element.

For the TinyTree and LinkedTree, we could implement this as follows. Let's assume that only a very small proportion of elements contain any namespace declarations. For the vast majority of elements, the NamespaceBindingSet is therefore the same object as for the parent element; the tree as a whole only contains as many NamespaceBindingSet objects as there are elements with namespace declarations, and this means we can probably afford to hold namespaces redundantly across multiple NamespaceBindingSets. However, a NamespaceBindingSet object could also be implemented as a delta relative to another NamespaceBindingSet.

Now xsl:copy for an element can simply pass the relevant NamespaceBindingSet straight down the Receiver pipeline until it hits a NamespaceReducer. The NamespaceReducer should be able to quickly discover that the same NamespaceBindingSet was used by the parent element, and therefore the whole thing can be ignored without picking it apart. Only where the namespaces are different between the child element and the parent is there any need to do any work.

The same approach should also benefit literal result elements, if we make sure that the compiled stylesheet uses common NamespaceBindingSets for literal result elements with the same set of in-scope namespaces.

It's not worth doing any of this if it only benefits pathological cases with 100 namespaces declared. But I think there could also be benefits for routine workloads, say copying a document with five namespaces all declared on the root element.

The main case where the current NodeInfo.getDeclaredNamespaces() is used (without also getting the ancestor namespaces) is for deep copy (xsl:copy-of) where at each level we only pass the namespace differences down the pipeline. We need to make sure there is no regression for this path. One way to do this is if the NamespaceBindingSet is actually maintained as a delta. If the NamespaceReducer calls some method to find the delta between the child namespaces and the parent namespaces, this can have a very efficient implementation for the common case where the NamespaceBindingSet for the child is actually implemented as a delta from the parent.

Actions #2

Updated by Michael Kay over 7 years ago

I have experimentally implemented something along these lines on the 9.8 branch. Initial result is that it cuts the transformation time for this test case from 7m to 30s. I need to to further testing to establish that it is functionally correct, and that it doesn't cause performance regression elsewhere. The change is too disruptive to consider implementing on the 9.7 branch.

The basic idea is that

(a) the Receiver.namespaces() method is changed to accept a NamespaceBindingSet

(b) NamespaceBinding implements NamespaceBindingSet

(c) there's a new implementation of NamespaceBindingSet called InScopeNamespaces which contains all the in-scope namespaces for an element

(d) the ComplexContentOutputter keeps a stack and if the namespaces output for a parent element are the same InScopeNamespaces as those output for a child element, the namespaces on the child element are ignored.

(e) the logic for xsl:copy (and xsl:copy-of), when copying the namespaces of a source element to the output pipeline, generates an InScopeNamespaces object for the nearest ancestor-or-self element that includes any namespace declarations.

Actions #3

Updated by Michael Kay over 7 years ago

Unfortunately I'm seeing an increase in elapsed time for an identity transform of the 100Mb XMark source file (which has no namespaces) from 3842ms to 4603ms.

At first sight I can't see how this can be due to this change. With copy-namespaces="no" I see a similar regression from 3732ms to 4564ms. (This is comparing the 9.7 branch with the 9.8 code). Incidentally both versions of the code fail to take advantage of the fact that for the TinyTree, we have a flag at the level of the document as a whole telling is that there are no namespaces anywhere.

Seems I was running 9.7 under JDK 1.8 and 9.8 under JDK 1.6, switching both to JDK 1.8 improves the 9.8 time to 4059ms, but still slower.

With -repeat:20 I get a time for 9.7 = 2367ms, 9.8 = 2333ms so it seems the only difference is in Java warm-up time.

Switching back to copy-namespaces="yes", the figures are 9.7 = 2415ms, 9.8 = 2423ms - not a significant difference.

I've added (in 9.8) a check for the TinyTree "usesNamespaces" flag and this brings the 9.8 time down to 2341ms. With bytecode generation enabled, we get a further small drop to 2270ms.

However, the 9.3 time under the same conditions is 2107ms, so we could do better...

Actions #4

Updated by Michael Kay over 7 years ago

Now running some tests doing an identity copy on a 71Mb file with two namespaces on the root element (a default namespace and the XSI namespace)

9.8 with bytecode enabled and -repeat:20 takes 2368ms

9.7 with bytecode enabled and -repeat:20 takes 2478ms

9.3 with -repeat:20 takes 1909ms

So the general conclusion is that the new optimisation is beneficial, but it doesn't reclaim all the regression that has occurred since 9.3. How much of this can be attributed to the dropping of NamePool namespace codes is hard to say, given that we also see a regression on a copy that doesn't use namespaces.

I also ran a deep-copy on the same file using the query ".". Times for this were:

9.3: 455ms

9.7: 734ms

9.8: 951ms

Ouch.

Actions #5

Updated by Michael Kay over 7 years ago

  • Status changed from New to Resolved
  • Applies to branch 9.8 added
  • Fix Committed on Branch 9.8 added

For anything taking under a second, -repeat:50 is safer than -repeat:20. But this only accentuates the difference:

9.7 - 566ms

9.8 - 916ms

Looking at the Java execution profile, 9.7 is spending something like 93% of the time in the UTF8Writer class. In 9.7 that accounts for only 40% of the time, with 11.6% in TinyElement.copy(), 10.4% in FingerprintedQName., 10.4% in XMLEmitter.startElement, 8.6%/5.7% in NamePool.getURI()/getLocalName().

I found one difference - on the 9.8 branch the index from namecodes to prefixes is no longer in the NamePool, it is now in the TinyTree. Changed the code to avoid a hash lookup for prefix code 0 (which always maps to the default namespace), but this made no difference, in fact it increased the time to 964ms.

In the course of investigation I found bug 3013, affecting whitespace compression, but it is present in all these releases and appears to have no noticeable effect on execution time. (However, I also took more measurements while investigating this, and this made it clear quite how much variability there is in the Java timings, even with -repeat:50).

What is clear is that the changes made to improve namespace handling for xsl:copy (the subject of this bug entry) do not appear to be relevant. I confirmed this by reverting these changes on the 9.8 branch and measuring again.

So I'm going to commit the shallow-copy changes and close this bug, and raise the regression in deep copy performance as a separate issue.

Actions #6

Updated by O'Neil Delpratt almost 7 years ago

  • Fix Committed on Branch trunk added
  • Fix Committed on Branch deleted (9.8)
Actions #7

Updated by O'Neil Delpratt almost 7 years ago

  • Applies to branch 9.7 added
  • Applies to branch deleted (9.8)
Actions #8

Updated by Michael Kay almost 7 years ago

  • Status changed from Resolved to Closed

Please register to edit this issue

Also available in: Atom PDF