Feature #6395: Feature request for reduced memory usage when compiling a main xquery on a compiler with prior calls to compileLibrary - Saxon - Saxonica Developer Community

Actions

Send by e-mail Copy link

Feature #6395

closed

Feature request for reduced memory usage when compiling a main xquery on a compiler with prior calls to compileLibrary

Added by Joshua Maurice 9 months ago. Updated 9 months ago.

Status:

Resolved

Priority:

Low

Assignee:

Michael Kay

Category:

Performance

Sprint/Milestone:

Start date:

2024-04-15

Due date:

% Done:

Estimated time:

Legacy ID:

Applies to branch:

Fix Committed on Branch:

trunk

Fixed in Maintenance Release:

Platforms:

.NET, Java

Description

What am I doing?

For test #1,

I create the JVM with -Xmx2g.
I create a single XQueryCompiler object.
I am applying a workaround for bug "https://saxonica.plan.io/issues/6394" (which involves subclassing two Saxonica internal classes, com.saxonica.expr.QueryLibraryImpl and com.saxonica.ee.optim.StaticQueryContextEE).
Using that same XQueryCompile object, in an endless loop, I call compile() on a trivial xquery string without any "import module" statements until I get an OutOfMemory error. (I store the XQueryExecutable objects in a local variable java.util.List object to prevent garbage collection.) Here is an example of the trivial main xquery string:

xquery version "3.0" encoding "utf-8";
'foo'

For test #2,

I do the same thing as test #1, except I add a half a dozen unused "import module" statement in the main xquery string. The imported modules are not used by the main xquery string. The imported modules have many function definitions and variable definitions (which are not used by the main xquery string).

For test #3,

I do the same thing as test #1, except I use the single XQueryCompiler object to call compileLibrary on a half a dozen very large xquery xq files before calling compile() in the endless loop.

For test #4,

I do the same thing as test #1, except I add a half a dozen unused "import module" statements (just like test #2), and I call compileLibrary on a half a dozen very large xquery xq files (just like test #3), including the imported module, before calling compile() in the endless loop.

What do I observe?

Test #1:

684,947 XQueryExecutable objects before OutOfMemoryError.
Before first observable gargabe collector pause: 651,404 XQueryExecutable objects creatued in 8.1 seconds.

Test #2:

859 XQueryExecutable objects before OutOfMemoryError.
Before first observable gargabe collector pause: 808 XQueryExecutable objects creatued in 44.5 seconds.

Test #3:

481,377 XQueryExecutable objects before OutOfMemoryError.
Before first observable gargabe collector pause: 457,865 XQueryExecutable objects creatued in 8.2 seconds.

Test #4:

53,979 XQueryExecutable objects before OutOfMemoryError.
Before first observable gargabe collector pause: 51,083 XQueryExecutable objects creatued in 9.0 seconds.

What do I want?

Test case #4 represents Informatica's production use case. We have our own concept of a job. Our jobs typically contain a dozen xqueries, and sometimes many more. Customers create job specs including the main xquery strings. Customers can publish them, and job specs are compiled into execution plans during publishing. During compilation, we want to precompile the main xquery string into an XQueryExecutable object and store that XQueryExecutable object as part of the compiled infa-job-execution-plan inside of a memory cache, in order to reduce execution time of execute-job-requests on our published jobspecs. This means that the memory usage of the XQueryExecutable objects is very important. It needs to be as small as possible. Smaller XQueryExecutable objects means we can fit more XQueryExecutable objects into memory, meaning we can fit more of our compiled published job execution plans into memory, meaning we can give better performance for more of our customers' jobs.

With prior compileLibrary calls on the imported module, the observed compile-time of a main xquery string with compile() is nearly the same with and without unused import modules (comparing tests #2 and #4). This wallclock runtime performance is sufficient.

However, the memory usage of individual XQueryExecutable objects is much higher than ideal (compare tests #2 and #4, and to a lesser extent compare tests #1 and #4). It would be much better for us if the common library information that was created with compileLibrary() could be stored in a single common place instead of being duplicated into each XQueryExecutable object.

Admittingly, this higher memory usage may be because of the workaround that this test uses to work around bug "https://saxonica.plan.io/issues/2429". However, based on my knowledge of Saxon internals used to write the workaround, I believe that our workaround is not responsible for the extra memory usage.

It appears that for every call to compile(), there will be one call to QueryLibraryImpl.link for each imported module. It appears(?) that each compile() call needs to create a new top-module QueryLibrary and import the function definitions and variable definitions of imported modules into the new top-module, with or without prior calls to compileLibrary(). This appears to be a shallow copy (at least for function definition objects but maybe not for variable definition objects), which probably explains the large different between test #3 and test #4, aka with and without compileLibrary(). However, the space occupied by the new containers of the shallow copies possibly(?) accounts for the difference between tests #2 and #4, aka with and without unused import module statements in the main xquery.

I am not sure if there is an easy fix in the current design to reduce this memory usage.

This is a polite feature request to make the memory usage of test #2 be similar to test #4, and also to make the memory usage of tests #2 and #4 be closer to test #1.

Other Notes

A partial, incomplete repro case is attached. Change the value of the "private static int test" field from 1 to 4 to run each of the 4 described test cases. I am unable for legal reasons to attach a full repro case because it involves proprietary xquery xq file contents. For the same legal concerns, I am not posting the workaround code for "https://saxonica.plan.io/issues/6394". I am also not including the code to create and register the approximately 50 ExtensionFunctionDefinition objects with the Saxon-Processor. I expect that you can reproduce similar results for any large xq libraries without ExtensionFunctionDefinitions.

This is on Saxon, Enterprise Edition, release 11.5.

Files

TestSaxonFeature.java (7.98 KB) TestSaxonFeature.java

Joshua Maurice, 2024-04-15 21:15

Actions

Copy link

Updated by Joshua Maurice 9 months ago

Additionally, I want to mention that if the central reason that this cannot be (easily) optimized is because of xquery reflection, e.g. fn:function-lookup(), I still request that a feature be added to check the main xquery string and the imported modules to see if they make use of xquery reflection, and if there is no xquery reflection, then apply the optimization. One should not suffer performance penalties for xquery reflection if our main xquery string and xquery libraries do not use xquery reflection.

PS: I am not sure if this would even be possible given the existence of ExtensionFunctionDefinition objects in our processor which are referenced by our xquery libraries. In that case, perhaps we could have a feature that says "we promise that we're not using xquery reflection, and please optimize assuming we are not", aka a feature to disable xquery-reflection and then optimize memory usage by removing unused shallow copy function definitions and (shallow) copy variable definitions.

Actions

Copy link

Updated by Michael Kay 9 months ago

You are correct that currently a main query is linked with its imported modules in such a way that if many queries import the same modules, some information from those modules will be replicated in each query. It's not a vast amount of data -- essentially an index of global function names and variable names -- but I can see that it builds up.

I think it's hard to avoid this for variables, because the way we manage global variables is to create a single pool of global variables each with distinct slot numbers. (Hard but not impossible, because the XSLT package mechanism does create a separate bindery for each package, and we could translate that design to the XQuery situation.) For functions, I think it's probably rather easier: the "function library" in the main module (which is essentially an index of functions) could contain a reference to the function library in the library module, rather than containing copies of its entries. We would need to change the way that we check for conflicts, because currently the checking for conflicts is done as part of the same operation as the building of the combined index, but that feels do-able. It would mean that the search for a function name takes a fraction longer, but that would be noticeable only if someone imports a very large number of library modules.

The internal design is greatly complicated by the fact that the XQuery compiler can encounter a reference to a function before it encounters its declaration, and by the rules for (a) cyclic imports of modules, and (b) dynamic features such as function-lookup and load-xquery-module. So none of this is easy.

Actions

Copy link

Updated by Michael Kay 9 months ago

I'm doing something similar to your test 4, importing a library with 10000 function declarations, and it's consuming about 2Mb for each query compilation. If I increase it to 20000 functions, that goes up to around 3Mb. So the overhead is around 1Mb per 10,000 functions, or 100 bytes per function, which is consistent with the theory that it's just adding an entry into an index.

By way of an experiment, I now tried the following:

compile one query that imports the 20000-function library module.
get a handle on its FunctionLibrary by doing FunctionLibrary first = exec.getUnderlyingCompiledQuery().getMainModule().getFunctionLibrary();
add this function library as an extension function library to the XQuery compiler by doing: ((StaticQueryContextEE)compiler.getUnderlyingStaticContext()).setExtensionFunctionLibrary(first);
Compile further queries without doing an import module (they access the functions as extension functions)

This works, and there is much smaller memory growth (about 4K bytes per query compilation).

This might provide the basis for a workaround.

The main difficulty with productising this is that it bypasses the checks on duplicated function names. Duplicates won't be detected, and the query will bind to whichever function is found first. However, it should be possible to change the implementation of "import module" so that the checking for duplicates uses a transient index of functions, which can be discarded once checking is complete, and then add a reference to the library module's function library rather than adding each function to the index.

Another possibility here is to reduce the cost of the check for duplicates by building into each library module a Bloom filter identifying the function names present; it's then a very fast check to confirm that there are no duplicates, followed by a more lengthy check if the fast check fails.

Actions

Copy link

Updated by Michael Kay 9 months ago

I'm coming to the conclusion that NOT maintaining the global index of functions at the level of a query executable has too many potential adverse consequences, on the internal structure of the product, on performance, and on diagnostics.

I'm wondering about an alternative approach, that would treat as a special case the execution of a main query module that consists solely of an "import module" plus a query expression - no other declarations in the top level. In this case it ought to be possible for the query expression to be compiled and execute with a static context that is simply a reference to the library module, meaning that there should be very little overhead in running different queries that take all their context from the same (set of) library modules. I'm not sure at the moment if this would be done using some new API, or some new language feature in the query, or a straight optimization detecting queries that fall into this pattern; or perhaps some combination of the three.

Actions

Copy link

Updated by Joshua Maurice 9 months ago

What about my proposal?

Ignoring ExtensionFunctionDefinition, it seems like an obvious optimization opportunity: detect when the main query and the imported xquery libs don't use xquery reflection, and then do dependency analysis and remove unused functions and variables from the XQueryExecutable object.

For ExtensionFunctionDefinition objects, include an opt-in flag on the individual ExtensionFunctionDefinition object that says "I promise not to use xquery reflection".

Actions

Copy link

Updated by Michael Kay 9 months ago

I'll keep that idea in mind; but I don't want to do an extra optimization pass unconditionally that very few users will benefit from, and I'm always reluctant to add more configuration switches that very few users will discover.

I've got another idea.

Currently, as I understand it, you're compiling a large number of queries (let's say 1000) each of which uses the same imported libraries; and you're saving each of these compiled queries as an XQueryExeecutable, presumably because it's going to be executed repeatedly with different parameters.

Now suppose instead that we could compile just one query, let's call it the master query, whose effect is to evaluate one of 1000 different XQuery expressions passed as a parameter? Conceptually this would do something like the xsl:evaluate instruction in XSLT - execute a supplied expression with full access to the functions in the static context of the caller.

But of course, you don't want to compile each of the 1000 expressions each time it is used, you want to compile each one once and then use it repeatedly.

I think it might be possible to achieve this effect today by use of extension functions.

Suppose the master query does:

import module lib="urn:library".
declare variable $query external;
saxon:compile-function($query);

where $query takes the form of an inline function expression, and saxon:compile-function (unlike the current saxon:compile-query) is defined to supply the static context of the caller to the query being compiled (in particular, the in-scope functions).

Then for each of the 1000 queries, instead of compiling a new XQueryExecutable, we would call the master query once supplying the query as a parameter; the call would return an XdmFunctionItem, and wherever the application currently invokes an XQueryEvaluator constructed from the XQueryExecutable, it would instead call the XdmFunctionItem with whatever run-time parameters are required.

I believe (needs confirmation) that it should be possible to implement the proposed saxon:compile-function() today as an integrated extension function, without any product changes, though possibly requiring some fairly deep delving into internal APIs. It's certainly possible if the query is restricted to XPath rather than XQuery syntax; XQuery might be a bit more difficult. I'll do some experiments to explore the possibility.

Actions

Copy link

Updated by Michael Kay 9 months ago

I think it might be possible to implement an XQueryExecutable.condense() method that eliminates data that has been retained only for debugging or for dynamic evaluation. This won't actually drop unused functions in library modules (because library modules might be used again for compiling another query), it will only remove them from the index held in the top-level module. The main difficulty will be testing that error paths don't crash if they try to access this information when producing diagnostics.

There's a risk that such a method could encourage flawed expectations. As proposed, it will only make a tiny difference to the memory used by one query; the benefit only comes in your use case where you have hundreds of queries sharing the same library modules. I'm conscious that I want to provide something that solves your particular problem but is also of general utility to a wider user base.

Actions

Copy link

Updated by Joshua Maurice 9 months ago

Currently, as I understand it, you're compiling a large number of queries (let's say 1000) each of which uses the same imported libraries; and you're saving each of these compiled queries as an XQueryExeecutable, presumably because it's going to be executed repeatedly with different parameters.

Correct. Although I might put my dream as closer to 100,000 or 1,000,000 XQueryExecutable objects per JVM, assuming many relatively small / simple main xquery strings.

There's a risk that such a method could encourage flawed expectations. As proposed, it will only make a tiny difference to the memory used by one query; the benefit only comes in your use case where you have hundreds of queries sharing the same library modules. I'm conscious that I want to provide something that solves your particular problem but is also of general utility to a wider user base.

I understand. I marked this as "low" priority originally, and I understand it is a feature request, and I understand that it should be done a "correct" way instead of a hacky way.

XQueryExecutable.condense()

I like that design too.

Actions

Copy link

Updated by Michael Kay 9 months ago

many relatively small / simple main xquery strings

Would it be acceptable to restrict these to be XPath expressions rather than XQuery expressions?

The XPathCompiler offers

XPathCompiler.addXsltFunctionLibrary (XsltPackage libraryPackage)

and it would be fairly straightforward to supplement this with XPathCompiler.addXQueryFunctionLibrary (XQueryExecutable libraryPackage).

Actions

Copy link

#10

Updated by Joshua Maurice 9 months ago

These xqueries come from our customer. They could contain anything. It would not be limited to just xpath.

Actions

Copy link

#11

Updated by Michael Kay 9 months ago

For the next major release, I have implemented an experimental optimization:

If an XQuery main module contains no function declarations, and contains a single module import, and the imported module is a precompiled library module, then we don't build a new function index for the main module, rather we reuse the function library already present in the precompiled library module.

Extending this to multiple imports isn't quite so straightforward. And unfortunately we can't just import a library that's the union of several libraries because function imports aren't transitive. We do know that the functions in different libraries are in different namespaces and therefore cannot conflict with each other. Oddly, however, XQuery allows a function in a main module to be in a namespace that is the same as that of an imported module, and this is a possible source of name conflicts.

This suggests that the library of user-defined functions should perhaps be a two-level structure organised first by namespace and then by local-name/arity, rather than being a single-level structure organised by namespace/local-name/arity as at present. It should then be possible to arrange that when a namespace is imported, we simply add an entry to the top-level index for that namespace, and only reconstruct the index for the namespace in the event that functions for that namespace come from more than one place.

Perhaps we could implement this structure using immutable maps. With a module import, we would start by adding a reference to the function library of the imported module to the importing module. If an additional function in the same namespace is encountered, we would make a non-destructive addition to this map, so the map in the main module and the map in the library module have now diverged, but still with reuse of parts that are common between the two.

Actions

Copy link

#12

Updated by Michael Kay 9 months ago

Status changed from New to Resolved
Assignee set to Michael Kay
Fix Committed on Branch trunk added
Platforms .NET added

For the next major release, I have redesigned the XQueryFunctionLibrary class which holds all the functions available in a module. This now uses a three-level index structure, first by namespace, then by local name, then by arity. The structure makes use of immutable maps, which enables an importing module to make a virtual copy of the library in an imported module and then add additional functions to its copy without affecting the original. This has the effect that a main module containing a number of module imports, and no additional function declarations, will simply contain a new top-level namespace index, which will point to the unmodified namespace-specific function libraries in the imported modules.

Please register to edit this issue

Actions

Send by e-mail Copy link

Also available in: Atom PDF

Project

Profile

Help

Saxon

Feature #6395

Feature request for reduced memory usage when compiling a main xquery on a compiler with prior calls to compileLibrary

Updated by Joshua Maurice 9 months ago

Updated by Michael Kay 9 months ago

Updated by Michael Kay 9 months ago

Updated by Michael Kay 9 months ago

Updated by Joshua Maurice 9 months ago

Updated by Michael Kay 9 months ago

Updated by Michael Kay 9 months ago

Updated by Joshua Maurice 9 months ago

Updated by Michael Kay 9 months ago

Updated by Joshua Maurice 9 months ago

Updated by Michael Kay 9 months ago

Updated by Michael Kay 9 months ago