Project

Profile

Help

Bug #6306

closed

Saxon in python is not releasing memory

Added by Vijay S 4 months ago. Updated 3 months ago.

Status:
Closed
Priority:
High
Category:
Python API
Start date:
2023-12-26
Due date:
% Done:

100%

Estimated time:
Found in version:
12.4.1
Fixed in version:
12.4.2
Platforms:

Description

Saxon in python is not releasing memory.

I am trying to use Saxonche in python for XSLT2.0 transformation.

I am using the below code in a for loop in my django project, it is providing the expected output but it is consuming a lot of memory and the memory is not getting released, because of which the container is getting restarted. I have tried searching the saxonica documents but unable to find the root cause or fix this issue.

from saxonche import *
with (PySaxonProcessor() as proc):
xslt_process = proc.new_xslt30_processor()
document = proc.parse_xml(xml_text=dom) # Load the XML document
stylesheet = xslt_process.compile_stylesheet(stylesheet_file=xsl_file_path) # Compile the XSLT stylesheet
# Apply the transformation
input_xslt = stylesheet.transform_to_string(xdm_node=document)

# cleaning string which we got from XSLT
input_xslt = ''.join(input_xslt.split("\n")).replace("</output1>", '')
input_xslt = input_xslt.replace("<output1>", '').strip()
input_xslt = input_xslt.replace('<?xml version="1.0" encoding="UTF-8"?>', '').strip()
input_xslt = [x.strip() for x in input_xslt.split(",")]
return input_xslt

Files

saxonc_memory_issue.py (1.45 KB) saxonc_memory_issue.py Vijay S, 2023-12-26 14:21
Actions #1

Updated by Martin Honnen 4 months ago

Is there any reason you need to create 10000 PySaxonProcessor objects and compile the same stylesheet that many times?

The usual recommendation with Saxon is to use a single (PySaxon)Processor object. And of course you can compile a stylesheet once into a PyXsltExecutable and that could/should be reused.

Note that I can't tell whether you have might have also run into some memory leak related to SaxonC, I am currently only trying to tell you that part of your code is highly inefficient as a use of Saxon(C) and you should certainly see less memory consumption if you make the suggested changes.

Actions #2

Updated by O'Neil Delpratt 4 months ago

  • Project changed from Saxon to SaxonC
  • Category deleted (XSLT 3.0 packages)
Actions #3

Updated by O'Neil Delpratt 4 months ago

  • Description updated (diff)
Actions #4

Updated by O'Neil Delpratt 4 months ago

  • Category set to Python API
  • Status changed from New to In Progress
  • Found in version set to 12.4.1

As Martin corrected stated in comment #1 you should reuse the PySaxonProcessor object.

  1. As a first experiment I ran your code as it is and the memory usage was around 2GB.
  2. I then moved the proc variable of the PySaxonProcessor object outside of the function the memory usage went down to 800MB.

See my modified code below:

import os, psutil
from saxoncee import *

print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)
proc = PySaxonProcessor()
xslt_processi = proc.new_xslt30_processor()

def xslt_process(xslt_process, xsl_file_path, dom):
        # xslt_process = proc.new_xslt30_processor()
        document = proc.parse_xml(xml_text=dom)  # Load the XML document
        stylesheet = xslt_process.compile_stylesheet(stylesheet_file=xsl_file_path)  # Compile the XSLT stylesheet
        # Apply the transformation
        input_xslt = stylesheet.transform_to_string(xdm_node=document)

        # cleaning string which we got from XSLT
        input_xslt = ''.join(input_xslt.split("\n")).replace("</output1>", '')
        input_xslt = input_xslt.replace("<output1>", '').strip()
        input_xslt = input_xslt.replace('<?xml version="1.0" encoding="UTF-8"?>', '').strip()
        input_xslt = [x.strip() for x in input_xslt.split(",")]
        print(input_xslt)


xsl_str = '''<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='3.0'> <xsl:param name='values' select='(2,3,4)' /><xsl:output method='xml'
indent='yes' /><xsl:template match='*'><output><xsl:value-of select='//person[1]'/><xsl:for-each select='$values' ><out><xsl:value-of select='. *
3'/></out></xsl:for-each></output></xsl:template></xsl:stylesheet>'''
with open('test.xsl', "w") as xsl_file:
            xsl_file.write(xsl_str.strip())

dom = "<doc><item>text1</item><item>text2</item><item>text3</item></doc>"
file = "test.xsl"

for i in range(10000):
    print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)
    xslt_process(xslt_processi, file, dom)
    print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)
print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)

Still investigating why the memory is not going down.

Actions #5

Updated by O'Neil Delpratt 4 months ago

To run with psutil you will need:

pip install psutil
Actions #6

Updated by O'Neil Delpratt 4 months ago

Update:

I think I have pinned down the cause of the memory leak. When I reuse the PyXsltExecutable (which I know you do not want to do) the memory ends around 67MB. This is much better and what I would expect. Also as a side note I am not 100% sure of Python's scheme for its management of the garbage collection, but I will look into it in more detail.

But I have a suspicion that there is a memory leak in the PyXsltExecutable class that will need investigating.

Actions #7

Updated by O'Neil Delpratt 4 months ago

  • % Done changed from 0 to 80

just to confirm the problem is the Java object for the XsltExecutable is not been released when it is finished with therefore it cannot be garbage collected. I have applied a patch in the C++ code: Specifically the destructor of the XsltExecutable.

I ran the example code above (from comment #4) and we are now using 60MB at the end of the python script.

I will do some more experiments before I mark this bug issue as resolved

Actions #8

Updated by O'Neil Delpratt 3 months ago

  • Status changed from In Progress to Resolved
  • % Done changed from 80 to 100

Bug fixed and available for the next maintenance release.

Actions #9

Updated by O'Neil Delpratt 3 months ago

  • Status changed from Resolved to Closed
  • Fixed in version set to 12.4.2

Fix applied in SaxonC 12.4.2 maintenance release

Please register to edit this issue

Also available in: Atom PDF