Project

Profile

Help

Feature #6395

closed

Feature request for reduced memory usage when compiling a main xquery on a compiler with prior calls to compileLibrary

Added by Joshua Maurice 7 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Performance
Sprint/Milestone:
-
Start date:
2024-04-15
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
11
Fix Committed on Branch:
trunk
Fixed in Maintenance Release:
Platforms:
.NET, Java

Description

What am I doing?

For test #1,

  • I create the JVM with -Xmx2g.
  • I create a single XQueryCompiler object.
  • I am applying a workaround for bug "https://saxonica.plan.io/issues/6394" (which involves subclassing two Saxonica internal classes, com.saxonica.expr.QueryLibraryImpl and com.saxonica.ee.optim.StaticQueryContextEE).
  • Using that same XQueryCompile object, in an endless loop, I call compile() on a trivial xquery string without any "import module" statements until I get an OutOfMemory error. (I store the XQueryExecutable objects in a local variable java.util.List object to prevent garbage collection.) Here is an example of the trivial main xquery string:
xquery version "3.0" encoding "utf-8";
'foo'

For test #2,

  • I do the same thing as test #1, except I add a half a dozen unused "import module" statement in the main xquery string. The imported modules are not used by the main xquery string. The imported modules have many function definitions and variable definitions (which are not used by the main xquery string).

For test #3,

  • I do the same thing as test #1, except I use the single XQueryCompiler object to call compileLibrary on a half a dozen very large xquery xq files before calling compile() in the endless loop.

For test #4,

  • I do the same thing as test #1, except I add a half a dozen unused "import module" statements (just like test #2), and I call compileLibrary on a half a dozen very large xquery xq files (just like test #3), including the imported module, before calling compile() in the endless loop.

What do I observe?

Test #1:

  • 684,947 XQueryExecutable objects before OutOfMemoryError.
  • Before first observable gargabe collector pause: 651,404 XQueryExecutable objects creatued in 8.1 seconds.

Test #2:

  • 859 XQueryExecutable objects before OutOfMemoryError.
  • Before first observable gargabe collector pause: 808 XQueryExecutable objects creatued in 44.5 seconds.

Test #3:

  • 481,377 XQueryExecutable objects before OutOfMemoryError.
  • Before first observable gargabe collector pause: 457,865 XQueryExecutable objects creatued in 8.2 seconds.

Test #4:

  • 53,979 XQueryExecutable objects before OutOfMemoryError.
  • Before first observable gargabe collector pause: 51,083 XQueryExecutable objects creatued in 9.0 seconds.

What do I want?

Test case #4 represents Informatica's production use case. We have our own concept of a job. Our jobs typically contain a dozen xqueries, and sometimes many more. Customers create job specs including the main xquery strings. Customers can publish them, and job specs are compiled into execution plans during publishing. During compilation, we want to precompile the main xquery string into an XQueryExecutable object and store that XQueryExecutable object as part of the compiled infa-job-execution-plan inside of a memory cache, in order to reduce execution time of execute-job-requests on our published jobspecs. This means that the memory usage of the XQueryExecutable objects is very important. It needs to be as small as possible. Smaller XQueryExecutable objects means we can fit more XQueryExecutable objects into memory, meaning we can fit more of our compiled published job execution plans into memory, meaning we can give better performance for more of our customers' jobs.

With prior compileLibrary calls on the imported module, the observed compile-time of a main xquery string with compile() is nearly the same with and without unused import modules (comparing tests #2 and #4). This wallclock runtime performance is sufficient.

However, the memory usage of individual XQueryExecutable objects is much higher than ideal (compare tests #2 and #4, and to a lesser extent compare tests #1 and #4). It would be much better for us if the common library information that was created with compileLibrary() could be stored in a single common place instead of being duplicated into each XQueryExecutable object.

Admittingly, this higher memory usage may be because of the workaround that this test uses to work around bug "https://saxonica.plan.io/issues/2429". However, based on my knowledge of Saxon internals used to write the workaround, I believe that our workaround is not responsible for the extra memory usage.

It appears that for every call to compile(), there will be one call to QueryLibraryImpl.link for each imported module. It appears(?) that each compile() call needs to create a new top-module QueryLibrary and import the function definitions and variable definitions of imported modules into the new top-module, with or without prior calls to compileLibrary(). This appears to be a shallow copy (at least for function definition objects but maybe not for variable definition objects), which probably explains the large different between test #3 and test #4, aka with and without compileLibrary(). However, the space occupied by the new containers of the shallow copies possibly(?) accounts for the difference between tests #2 and #4, aka with and without unused import module statements in the main xquery.

I am not sure if there is an easy fix in the current design to reduce this memory usage.

This is a polite feature request to make the memory usage of test #2 be similar to test #4, and also to make the memory usage of tests #2 and #4 be closer to test #1.

Other Notes

A partial, incomplete repro case is attached. Change the value of the "private static int test" field from 1 to 4 to run each of the 4 described test cases. I am unable for legal reasons to attach a full repro case because it involves proprietary xquery xq file contents. For the same legal concerns, I am not posting the workaround code for "https://saxonica.plan.io/issues/6394". I am also not including the code to create and register the approximately 50 ExtensionFunctionDefinition objects with the Saxon-Processor. I expect that you can reproduce similar results for any large xq libraries without ExtensionFunctionDefinitions.

This is on Saxon, Enterprise Edition, release 11.5.


Files

TestSaxonFeature.java (7.98 KB) TestSaxonFeature.java Joshua Maurice, 2024-04-15 21:15

Please register to edit this issue

Also available in: Atom PDF