Project

Profile

Help

Bug #6306

closed

Saxon in python is not releasing memory

Added by Vijay S 10 months ago. Updated 9 months ago.

Status:
Closed
Priority:
High
Category:
Python API
Start date:
2023-12-26
Due date:
% Done:

100%

Estimated time:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Found in version:
12.4.1
Fixed in version:
12.4.2
SaxonC Languages:
SaxonC Platforms:
SaxonC Architecture:

Description

Saxon in python is not releasing memory.

I am trying to use Saxonche in python for XSLT2.0 transformation.

I am using the below code in a for loop in my django project, it is providing the expected output but it is consuming a lot of memory and the memory is not getting released, because of which the container is getting restarted. I have tried searching the saxonica documents but unable to find the root cause or fix this issue.

from saxonche import *
with (PySaxonProcessor() as proc):
xslt_process = proc.new_xslt30_processor()
document = proc.parse_xml(xml_text=dom) # Load the XML document
stylesheet = xslt_process.compile_stylesheet(stylesheet_file=xsl_file_path) # Compile the XSLT stylesheet
# Apply the transformation
input_xslt = stylesheet.transform_to_string(xdm_node=document)

# cleaning string which we got from XSLT
input_xslt = ''.join(input_xslt.split("\n")).replace("</output1>", '')
input_xslt = input_xslt.replace("<output1>", '').strip()
input_xslt = input_xslt.replace('<?xml version="1.0" encoding="UTF-8"?>', '').strip()
input_xslt = [x.strip() for x in input_xslt.split(",")]
return input_xslt

Files

saxonc_memory_issue.py (1.45 KB) saxonc_memory_issue.py Vijay S, 2023-12-26 14:21
Actions #1

Updated by Martin Honnen 10 months ago

Is there any reason you need to create 10000 PySaxonProcessor objects and compile the same stylesheet that many times?

The usual recommendation with Saxon is to use a single (PySaxon)Processor object. And of course you can compile a stylesheet once into a PyXsltExecutable and that could/should be reused.

Note that I can't tell whether you have might have also run into some memory leak related to SaxonC, I am currently only trying to tell you that part of your code is highly inefficient as a use of Saxon(C) and you should certainly see less memory consumption if you make the suggested changes.

Actions #2

Updated by O'Neil Delpratt 10 months ago

  • Project changed from Saxon to SaxonC
  • Category deleted (XSLT 3.0 packages)
Actions #3

Updated by O'Neil Delpratt 10 months ago

  • Description updated (diff)
Actions #4

Updated by O'Neil Delpratt 10 months ago

  • Category set to Python API
  • Status changed from New to In Progress
  • Found in version set to 12.4.1

As Martin corrected stated in comment #1 you should reuse the PySaxonProcessor object.

  1. As a first experiment I ran your code as it is and the memory usage was around 2GB.
  2. I then moved the proc variable of the PySaxonProcessor object outside of the function the memory usage went down to 800MB.

See my modified code below:

import os, psutil
from saxoncee import *

print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)
proc = PySaxonProcessor()
xslt_processi = proc.new_xslt30_processor()

def xslt_process(xslt_process, xsl_file_path, dom):
        # xslt_process = proc.new_xslt30_processor()
        document = proc.parse_xml(xml_text=dom)  # Load the XML document
        stylesheet = xslt_process.compile_stylesheet(stylesheet_file=xsl_file_path)  # Compile the XSLT stylesheet
        # Apply the transformation
        input_xslt = stylesheet.transform_to_string(xdm_node=document)

        # cleaning string which we got from XSLT
        input_xslt = ''.join(input_xslt.split("\n")).replace("</output1>", '')
        input_xslt = input_xslt.replace("<output1>", '').strip()
        input_xslt = input_xslt.replace('<?xml version="1.0" encoding="UTF-8"?>', '').strip()
        input_xslt = [x.strip() for x in input_xslt.split(",")]
        print(input_xslt)


xsl_str = '''<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='3.0'> <xsl:param name='values' select='(2,3,4)' /><xsl:output method='xml'
indent='yes' /><xsl:template match='*'><output><xsl:value-of select='//person[1]'/><xsl:for-each select='$values' ><out><xsl:value-of select='. *
3'/></out></xsl:for-each></output></xsl:template></xsl:stylesheet>'''
with open('test.xsl', "w") as xsl_file:
            xsl_file.write(xsl_str.strip())

dom = "<doc><item>text1</item><item>text2</item><item>text3</item></doc>"
file = "test.xsl"

for i in range(10000):
    print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)
    xslt_process(xslt_processi, file, dom)
    print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)
print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)

Still investigating why the memory is not going down.

Actions #5

Updated by O'Neil Delpratt 10 months ago

To run with psutil you will need:

pip install psutil
Actions #6

Updated by O'Neil Delpratt 10 months ago

Update:

I think I have pinned down the cause of the memory leak. When I reuse the PyXsltExecutable (which I know you do not want to do) the memory ends around 67MB. This is much better and what I would expect. Also as a side note I am not 100% sure of Python's scheme for its management of the garbage collection, but I will look into it in more detail.

But I have a suspicion that there is a memory leak in the PyXsltExecutable class that will need investigating.

Actions #7

Updated by O'Neil Delpratt 10 months ago

  • % Done changed from 0 to 80

just to confirm the problem is the Java object for the XsltExecutable is not been released when it is finished with therefore it cannot be garbage collected. I have applied a patch in the C++ code: Specifically the destructor of the XsltExecutable.

I ran the example code above (from comment #4) and we are now using 60MB at the end of the python script.

I will do some more experiments before I mark this bug issue as resolved

Actions #8

Updated by O'Neil Delpratt 9 months ago

  • Status changed from In Progress to Resolved
  • % Done changed from 80 to 100

Bug fixed and available for the next maintenance release.

Actions #9

Updated by O'Neil Delpratt 9 months ago

  • Status changed from Resolved to Closed
  • Fixed in version set to 12.4.2

Fix applied in SaxonC 12.4.2 maintenance release

Please register to edit this issue

Also available in: Atom PDF