Project

Profile

Help

Deleting XdmValue, XdmNode, XPathProcessor and SaxonProcessor

Added by Anna Daniau about 1 year ago

I'm using Saxon/C in a C/C++ program, but when creating multiple SaxonProcessor, XPathProcessor XdmValue, or XdmNode objects, I'm not sure how to properly delete them to avoid memory leaks. Calling their normal destructors doesn't seem to be enough.

Here is a sample program:

#include "SaxonProcessor.h"
#include "XPathProcessor.h"
#include "XdmNode.h"
#include "XdmValue.h"

int main()
{
        SaxonProcessor *saxonProcessor = new SaxonProcessor(false);

        while (true) {
                XdmNode *node = saxonProcessor->parseXmlFromString("<a/>");

                XPathProcessor *xpathProcessor = saxonProcessor->newXPathProcessor();
                xpathProcessor->setContextItem(node);

                XdmValue *value = xpathProcessor->evaluate("/");
                
                delete value;
                delete xpathProcessor;
                delete node;
        }

        delete saxonProcessor;

        SaxonProcessor::release();
}

When I run this program, the memory usage just continues to grow and grow. What needs to be done to properly clean things up at the end of this loop, so that there are no memory leaks?

Thank you and apologies in advance if I'm just missing something obvious here.


Replies (8)

Please register to reply

RE: Deleting XdmValue, XdmNode, XPathProcessor and SaxonProcessor - Added by O'Neil Delpratt about 1 year ago

Hi Anna,

Before I investigate this further looking at you code I see while (true) { ..... I would expect this loop to run for ever or until it runs out of memory, hence the increase in memory.

How many times you intend to run the while loop?

RE: Deleting XdmValue, XdmNode, XPathProcessor and SaxonProcessor - Added by O'Neil Delpratt about 1 year ago

Looking at this again. I think the code above should not grow in memory given that all variables are deleted for each loop.

I suspect the setContextItem(node) is stopping the node from deleting. I am investigating this further.

RE: Deleting XdmValue, XdmNode, XPathProcessor and SaxonProcessor - Added by O'Neil Delpratt about 1 year ago

Hi Anna,

Thanks for the repo. I have managed to run the C++ code and I am experiencing the same memory increase. I will investigate this further and issoluate where the memory is not being released.

RE: Deleting XdmValue, XdmNode, XPathProcessor and SaxonProcessor - Added by Anna Daniau about 1 year ago

Hi O'Neil,

Thank you for your response! It sounds like you've already been able to replicate the issue, but for more context, here is the actual program I am working on: https://github.com/kibook/s1kd-tools/tree/master/tools/s1kd-brexcheck. Essentially, this program evaluates a set of XPath expressions on a set of XML documents, and determines which documents contain nodes matched by those expressions. I am using Saxon/C specifically to add support for XPath 2.0 expressions, as the main library I used, libxml, only supports XPath 1.0.

I did notice through some trial-and-error that manually calling XdmNode::decrementRefCount before deleting node lessened the amount the memory grew by each loop, which may be related to what you mentioned about setContextItem. However, it didn't eliminate the issue completely, and I am unsure if this is the "right" way to go about ensuring the memory is cleaned up.

It seems like there may be multiple points where the memory is leaking, so maybe it would help if I broke these into smaller sample programs:

Sample #1 - XPathProcessor:

#include "SaxonProcessor.h"
#include "XPathProcessor.h"

int main()
{
        SaxonProcessor *saxonProcessor = new SaxonProcessor(false);

        while (true) {
                XPathProcessor *xpathProcessor = saxonProcessor->newXPathProcessor();

                delete xpathProcessor;
        }

        delete saxonProcessor;

        SaxonProcessor::release();
}

Sample #2 - XdmNode:

#include "SaxonProcessor.h"
#include "XdmNode.h"

int main()
{
        SaxonProcessor *saxonProcessor = new SaxonProcessor(false);

        while (true) {
                XdmNode *node = saxonProcessor->parseXmlFromString("<a/>");

                delete node;
        }

        delete saxonProcessor;

        SaxonProcessor::release();
}

Both of the above also seem to exhibit a memory leak, even when not using setContextItem .

Thank you again for looking into this! While the actual program doesn't run indefinitely like the while (true) { of these sample programs, the set of XML documents being parsed can easily be in the thousands, and unfortunately these leaks tend to cause the program to slow down and fail after only a few dozen (depending on how much memory is available on the computer it is run on).

RE: Deleting XdmValue, XdmNode, XPathProcessor and SaxonProcessor - Added by O'Neil Delpratt about 1 year ago

Hi Anna,

thanks for breaking down the memory issue into separate samples. That is helpful.

Firstly, you are on the right path of investigation of calling the decrementRefCount. Ideally you should not need to call it yourself. I was looking at this whole area of Xdm objects and their refCount. I will investigate it further, maybe there is a bug in the logic. There is some subtle memory management going on here to keep alive the Xdm object when it referenced in more than one place, i.e. in a XPath/XSLT/XQuery processor or even when used in Python or PHP extension.

How have you been measuring memory usage? I initially started with top then I setup some memory instruments using some snippets of code from stackoverflow

See the code snippet I used for measuring physical memory (in KB):


#include "stdlib.h"
#include "stdio.h"
#include "string.h"


int parseLine(char* line) {

  int i = strlen(line);
  const char* p = line;
  while(*p < '0' || *p > '9') p++;
  line[i-3] = '\0';
  i = atoi(p);
}


int getPValue() {
  FILE* file = fopen("/proc/self/status", "r");
  int result = -1;
  char line[128];

  while (fgets(line, 128, file) != NULL) {
    if (strncmp(line, "VmRSS:", 6) == 0) {
      result = parseLine(line);
      break;
    }

  }
  fclose(file);
  return result;

}

Sample #1:

Physical Memory before loop =56824
Physical Memory (in loop) = 60684
Physical Memory (in loop) = 60684
Physical Memory (in loop) = 60684
Physical Memory (in loop) = 60684
Physical Memory (in loop) = 60684

Between each loop the memory is not increasing, which is good but it is not getting released either. So worth looking at. Maybe there is some underlying data hanging around between the SaxonPrcoessor and XPathProcessor object that is not being deleted.

Sample #2:

Total Physical Memory used = 60684
Physical Memory (in loop) = 68484
Physical Memory (in loop) = 68608
Physical Memory (in loop) = 68672
Physical Memory (in loop) = 68732
Physical Memory (in loop) = 68960
Physical Memory (in loop) = 68960
Physical Memory (in loop) = 68960
Physical Memory (in loop) = 68960
Physical Memory (in loop) = 68960
Physical Memory (in loop) = 68968

SaxonProc: JVM finalized calling !
physical memory=68968

The XdmNode object created in each loop run is clearing showing memory leak after each delete is called.

I will being investigating this issue further, but it may have to hold off until next week.

RE: Deleting XdmValue, XdmNode, XPathProcessor and SaxonProcessor - Added by Anna Daniau about 1 year ago

I was also using top to watch the memory usage, but now I've added in the code you posted to get more precise measurements. Thanks! Normally, I use valgrind to check for memory leaks, but I haven't been able to get it to work with the JET runtime that Saxon/C uses.

Regarding Sample #1, I am seeing the memory usage stay constant for a few iterations of the loop like you, but eventually it continues to grow. I added a test so that it only prints the memory usage when it changes to highlight this:

#include "SaxonProcessor.h"
#include "XdmNode.h"
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int parseLine(char *line)
{
        int i = strlen(line);
        const char *p = line;
        while (*p < '0' || *p > '9') p++;
        line[i - 3] = '\0';
        i = atoi(p);
        return i;
}

int getValue()
{
        FILE *file = fopen("/proc/self/status", "r");
        int result = -1;
        char line[128];

        while (fgets(line, 128, file) != NULL) {
                if (strncmp(line, "VmRSS:", 6) == 0) {
                        result = parseLine(line);
                        break;
                }
        }

        fclose(file);
        return result;
}

int main()
{
        SaxonProcessor *saxonProcessor = new SaxonProcessor(false);
        int lastMem = -1;
        int curMem = getValue();

        std::cout << "Physical memory before loop = " << curMem << std::endl;

        for (int i = 1; ; ++i) {
                XPathProcessor *xpathProcessor = saxonProcessor->newXPathProcessor();

                delete xpathProcessor;

                curMem = getValue();

                if (curMem != lastMem) {
                        std::cout << "Physical memory at iteration " << i << " = " << curMem << std::endl;
                        lastMem = curMem;
                }
        }

        delete saxonProcessor;

        SaxonProcessor::release();
}

Here is a snippet of the output:

Physical memory before loop = 35232
Physical memory at iteration 1 = 36404
Physical memory at iteration 2 = 36428
Physical memory at iteration 31 = 36440
Physical memory at iteration 78 = 36696
Physical memory at iteration 127 = 36700
Physical memory at iteration 212 = 36684
Physical memory at iteration 213 = 36940
Physical memory at iteration 255 = 36948
Physical memory at iteration 323 = 36944
Physical memory at iteration 324 = 37200
Physical memory at iteration 398 = 37456

I'm not sure if this is a limitation of the method used to measure the memory usage (does /proc/self/status only update on certain intervals?), or if XPathProcessor only sometimes fails to be fully deleted.

Let me know if I can provide any more information, or if there's any other tests you'd like me to run. Thank you!

RE: Deleting XdmValue, XdmNode, XPathProcessor and SaxonProcessor - Added by O'Neil Delpratt about 1 year ago

Hi,

Sorry for the delay in getting back to you on this issue.

I have finally managed to spend some time on this investigation. I went down several avenues trying to solve where the memory leaking was coming from. I actually don't think there is a bug within Saxon/C, but that it is VM garbage collection (GC) issue or rather the GC has not taken place at that time.

Firstly, I looked at our use of XdmItem reference counting, which is our own memory management within the C++ code. I found it to be working as expected. Although I mentioned in one of my previous posts that the calling program should set the status of when the XdmItem is no longer used. I will look to tidy this up for the C++ users as this should happen seamlessly as in PHP and Python.

Secondly, we should be using where appropriate the JNI global references as opposed the local reference to hold on to objects created that needs to be kept beyond the method which created the object. Therefore avoiding any potential issues when GC happens. There is a lot discussion on JNI local vs global references. I tidied up how we use it in the parseXmlFromString method . At this point I feel that I was clutching at straws. For example, when we call parseXmlFromString in the C++ code, internally the callback to the Java code returns a jobject which wrap within the C++ class XdmItem. This jobject we should set it as a global reference to prevent the object being deleted when the GC is called. This means we have to handle the deleting of the global reference when we call the XdmItem destructor. However, this did not improve the memory issue.

The real cause is the GC not as yet taken place. Excelsior Jet tool is the the VM to handle the call-backs to the cross compiled Java code. As with java applications there is a threshold as to when garbage collection will happen. Interestingly, I forced the garbage collection within the C++ code and the physical memory is now the same between the iterations. See code below which calls the GC via JNI (only for experimental use):

 jclass    systemClass    = NULL;
 jmethodID systemGCMethod = NULL;
 // ...
 // Take out the trash.
 systemClass    = SaxonProcessor::sxn_environ->env->FindClass("java/lang/System");
 systemGCMethod = SaxonProcessor::sxn_environ->env->GetStaticMethodID(systemClass, "gc", "()V");
 SaxonProcessor::sxn_environ->env->CallStaticVoidMethod(systemClass , systemGCMethod);

I don't recommend calling the gc() method within an application as there is a performance cost. Also I don't expect there to be problem with memory usage due to Excelsior Jet doing a good job of clearing the GC based upon their internal ratios which we have set to the default for optimal performance. There is alway trade-off with memory and performance.

Going forward we should add some details of memory management and the GC in the documentation. Also make the JNI code somewhat cleaner with the use of global references. This will happen in the next major release.

RE: Deleting XdmValue, XdmNode, XPathProcessor and SaxonProcessor - Added by Anna Daniau about 1 year ago

Thank you very much for looking into this so thoroughly!

Interestingly, I forced the garbage collection within the C++ code and the physical memory is now the same between the iterations.

Calling the gc() method manually as you propose definitely had an effect in reducing how much the memory usage increased by. However, perhaps I am not using it correctly, as the memory usage still seems to go up, just more slowly. Here is a full sample program:

#include "SaxonProcessor.h"
#include "XdmNode.h"
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int parseLine(char *line)
{
        int i = strlen(line);
        const char *p = line;
        while (*p < '0' || *p > '9') p++;
        line[i - 3] = '\0';
        i = atoi(p);
        return i;
}

int getValue()
{
        FILE *file = fopen("/proc/self/status", "r");
        int result = -1;
        char line[128];

        while (fgets(line, 128, file) != NULL) {
                if (strncmp(line, "VmRSS:", 6) == 0) {
                        result = parseLine(line);
                        break;
                }
        }

        fclose(file);
        return result;
}

int main(int argc, char **argv)
{
        SaxonProcessor *saxonProcessor = new SaxonProcessor(false);
        int lastMem = -1;
        int curMem = getValue();
        bool gc = argc > 1 && strcmp(argv[1], "-gc") == 0;

        std::cout << "Physical memory before loop = " << curMem << std::endl;

        for (int i = 1; ; ++i) {
                XdmNode *node = saxonProcessor->parseXmlFromString("<a/>");

                delete node;

                curMem = getValue();

                if (curMem != lastMem) {
                        std::cout << "Physical memory at iteration " << i << " = " << curMem << " (" << (curMem >= lastMem ? "+" : "") << curMem - lastMem << ")" << std::endl;
                        lastMem = curMem;
                }

                if (gc) {
                        jclass    systemClass    = NULL;
                        jmethodID systemGCMethod = NULL;
                        systemClass    = SaxonProcessor::sxn_environ->env->FindClass("java/lang/System");
                        systemGCMethod = SaxonProcessor::sxn_environ->env->GetStaticMethodID(systemClass, "gc", "()V");
                        SaxonProcessor::sxn_environ->env->CallStaticVoidMethod(systemClass , systemGCMethod);
                }
        }

        delete saxonProcessor;

        SaxonProcessor::release();
}

When run with ./a.out:

Physical memory before loop = 37564
...
Physical memory at iteration 17158 = 212460 (+256)
Physical memory at iteration 17177 = 212716 (+256)
Physical memory at iteration 17194 = 212972 (+256)
Physical memory at iteration 17211 = 213228 (+256)
Physical memory at iteration 17234 = 213484 (+256)
Physical memory at iteration 17252 = 213740 (+256)
Physical memory at iteration 17270 = 213996 (+256)
Physical memory at iteration 17287 = 214252 (+256)
Physical memory at iteration 17307 = 214508 (+256)
Physical memory at iteration 17324 = 214764 (+256)
Physical memory at iteration 17345 = 215020 (+256)
Physical memory at iteration 17362 = 215276 (+256)
Physical memory at iteration 17379 = 215532 (+256)
Physical memory at iteration 17398 = 215788 (+256)
Physical memory at iteration 17406 = 215792 (+4)
Physical memory at iteration 17419 = 216048 (+256)
Physical memory at iteration 17440 = 216304 (+256)
Physical memory at iteration 17458 = 216560 (+256)
Physical memory at iteration 17475 = 216816 (+256)
Physical memory at iteration 17493 = 217072 (+256)
Physical memory at iteration 17514 = 217328 (+256)
Physical memory at iteration 17532 = 217584 (+256)
Physical memory at iteration 17547 = 217840 (+256)

When run with ./a.out -gc:

Physical memory before loop = 36700
...
Physical memory at iteration 17461 = 75072 (-64)
Physical memory at iteration 17462 = 75040 (-32)
Physical memory at iteration 17463 = 75024 (-16)
Physical memory at iteration 17469 = 75152 (+128)
Physical memory at iteration 17470 = 75088 (-64)
Physical memory at iteration 17471 = 75056 (-32)
Physical memory at iteration 17472 = 75040 (-16)
Physical memory at iteration 17474 = 75168 (+128)
Physical memory at iteration 17475 = 75104 (-64)
Physical memory at iteration 17476 = 75088 (-16)
Physical memory at iteration 17477 = 75072 (-16)
Physical memory at iteration 17503 = 75200 (+128)
Physical memory at iteration 17504 = 75136 (-64)
Physical memory at iteration 17505 = 75120 (-16)
Physical memory at iteration 17506 = 75104 (-16)
Physical memory at iteration 17511 = 75232 (+128)
Physical memory at iteration 17512 = 75168 (-64)
Physical memory at iteration 17513 = 75136 (-32)
Physical memory at iteration 17514 = 75120 (-16)
Physical memory at iteration 17515 = 75248 (+128)
Physical memory at iteration 17516 = 75184 (-64)
Physical memory at iteration 17517 = 75152 (-32)
Physical memory at iteration 17518 = 75136 (-16)
Physical memory at iteration 17536 = 75264 (+128)
Physical memory at iteration 17537 = 75200 (-64)
Physical memory at iteration 17538 = 75168 (-32)
Physical memory at iteration 17539 = 75152 (-16)
Physical memory at iteration 17545 = 75280 (+128)
Physical memory at iteration 17546 = 75216 (-64)
Physical memory at iteration 17547 = 75184 (-32)

So, when calling gc() within the loop, it does free up some memory, but apparently not as much as it allocates, as the overall usage continues to slowly increase.

I will look to tidy this up for the C++ users as this should happen seamlessly as in PHP and Python.

Going forward we should add some details of memory management and the GC in the documentation.

Excellent, thank you!

    (1-8/8)

    Please register to reply