Bug #6306
closedSaxon in python is not releasing memory
100%
Description
Saxon in python is not releasing memory.
I am trying to use Saxonche in python for XSLT2.0 transformation.
I am using the below code in a for loop in my django project, it is providing the expected output but it is consuming a lot of memory and the memory is not getting released, because of which the container is getting restarted. I have tried searching the saxonica documents but unable to find the root cause or fix this issue.
from saxonche import *
with (PySaxonProcessor() as proc):
xslt_process = proc.new_xslt30_processor()
document = proc.parse_xml(xml_text=dom) # Load the XML document
stylesheet = xslt_process.compile_stylesheet(stylesheet_file=xsl_file_path) # Compile the XSLT stylesheet
# Apply the transformation
input_xslt = stylesheet.transform_to_string(xdm_node=document)
# cleaning string which we got from XSLT
input_xslt = ''.join(input_xslt.split("\n")).replace("</output1>", '')
input_xslt = input_xslt.replace("<output1>", '').strip()
input_xslt = input_xslt.replace('<?xml version="1.0" encoding="UTF-8"?>', '').strip()
input_xslt = [x.strip() for x in input_xslt.split(",")]
return input_xslt
Files
Updated by Martin Honnen 12 months ago
Is there any reason you need to create 10000 PySaxonProcessor objects and compile the same stylesheet that many times?
The usual recommendation with Saxon is to use a single (PySaxon)Processor object. And of course you can compile a stylesheet once into a PyXsltExecutable and that could/should be reused.
Note that I can't tell whether you have might have also run into some memory leak related to SaxonC, I am currently only trying to tell you that part of your code is highly inefficient as a use of Saxon(C) and you should certainly see less memory consumption if you make the suggested changes.
Updated by O'Neil Delpratt 12 months ago
- Project changed from Saxon to SaxonC
- Category deleted (
XSLT 3.0 packages)
Updated by O'Neil Delpratt 12 months ago
- Category set to Python API
- Status changed from New to In Progress
- Found in version set to 12.4.1
As Martin corrected stated in comment #1 you should reuse the PySaxonProcessor
object.
- As a first experiment I ran your code as it is and the memory usage was around 2GB.
- I then moved the
proc
variable of thePySaxonProcessor
object outside of the function the memory usage went down to 800MB.
See my modified code below:
import os, psutil
from saxoncee import *
print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)
proc = PySaxonProcessor()
xslt_processi = proc.new_xslt30_processor()
def xslt_process(xslt_process, xsl_file_path, dom):
# xslt_process = proc.new_xslt30_processor()
document = proc.parse_xml(xml_text=dom) # Load the XML document
stylesheet = xslt_process.compile_stylesheet(stylesheet_file=xsl_file_path) # Compile the XSLT stylesheet
# Apply the transformation
input_xslt = stylesheet.transform_to_string(xdm_node=document)
# cleaning string which we got from XSLT
input_xslt = ''.join(input_xslt.split("\n")).replace("</output1>", '')
input_xslt = input_xslt.replace("<output1>", '').strip()
input_xslt = input_xslt.replace('<?xml version="1.0" encoding="UTF-8"?>', '').strip()
input_xslt = [x.strip() for x in input_xslt.split(",")]
print(input_xslt)
xsl_str = '''<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='3.0'> <xsl:param name='values' select='(2,3,4)' /><xsl:output method='xml'
indent='yes' /><xsl:template match='*'><output><xsl:value-of select='//person[1]'/><xsl:for-each select='$values' ><out><xsl:value-of select='. *
3'/></out></xsl:for-each></output></xsl:template></xsl:stylesheet>'''
with open('test.xsl', "w") as xsl_file:
xsl_file.write(xsl_str.strip())
dom = "<doc><item>text1</item><item>text2</item><item>text3</item></doc>"
file = "test.xsl"
for i in range(10000):
print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)
xslt_process(xslt_processi, file, dom)
print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)
print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)
Still investigating why the memory is not going down.
Updated by O'Neil Delpratt 12 months ago
To run with psutil you will need:
pip install psutil
Updated by O'Neil Delpratt 12 months ago
Update:
I think I have pinned down the cause of the memory leak. When I reuse the PyXsltExecutable
(which I know you do not want to do) the memory ends around 67MB. This is much better and what I would expect. Also as a side note I am not 100% sure of Python's scheme for its management of the garbage collection, but I will look into it in more detail.
But I have a suspicion that there is a memory leak in the PyXsltExecutable
class that will need investigating.
Updated by O'Neil Delpratt 12 months ago
- % Done changed from 0 to 80
just to confirm the problem is the Java object for the XsltExecutable is not been released when it is finished with therefore it cannot be garbage collected. I have applied a patch in the C++ code: Specifically the destructor of the XsltExecutable.
I ran the example code above (from comment #4) and we are now using 60MB at the end of the python script.
I will do some more experiments before I mark this bug issue as resolved
Updated by O'Neil Delpratt 11 months ago
- Status changed from In Progress to Resolved
- % Done changed from 80 to 100
Bug fixed and available for the next maintenance release.
Updated by O'Neil Delpratt 11 months ago
- Status changed from Resolved to Closed
- Fixed in version set to 12.4.2
Fix applied in SaxonC 12.4.2 maintenance release
Please register to edit this issue