Bug #6306
closed
Saxon in python is not releasing memory
Fixed in Maintenance Release:
Description
Saxon in python is not releasing memory.
I am trying to use Saxonche in python for XSLT2.0 transformation.
I am using the below code in a for loop in my django project, it is providing the expected output but it is consuming a lot of memory and the memory is not getting released, because of which the container is getting restarted. I have tried searching the saxonica documents but unable to find the root cause or fix this issue.
from saxonche import *
with (PySaxonProcessor() as proc):
xslt_process = proc.new_xslt30_processor()
document = proc.parse_xml(xml_text=dom) # Load the XML document
stylesheet = xslt_process.compile_stylesheet(stylesheet_file=xsl_file_path) # Compile the XSLT stylesheet
# Apply the transformation
input_xslt = stylesheet.transform_to_string(xdm_node=document)
# cleaning string which we got from XSLT
input_xslt = ''.join(input_xslt.split("\n")).replace("</output1>", '')
input_xslt = input_xslt.replace("<output1>", '').strip()
input_xslt = input_xslt.replace('<?xml version="1.0" encoding="UTF-8"?>', '').strip()
input_xslt = [x.strip() for x in input_xslt.split(",")]
return input_xslt
Files
Is there any reason you need to create 10000 PySaxonProcessor objects and compile the same stylesheet that many times?
The usual recommendation with Saxon is to use a single (PySaxon)Processor object. And of course you can compile a stylesheet once into a PyXsltExecutable and that could/should be reused.
Note that I can't tell whether you have might have also run into some memory leak related to SaxonC, I am currently only trying to tell you that part of your code is highly inefficient as a use of Saxon(C) and you should certainly see less memory consumption if you make the suggested changes.
- Project changed from Saxon to SaxonC
- Category deleted (
XSLT 3.0 packages)
- Description updated (diff)
- Category set to Python API
- Status changed from New to In Progress
- Found in version set to 12.4.1
As Martin corrected stated in comment #1 you should reuse the PySaxonProcessor
object.
- As a first experiment I ran your code as it is and the memory usage was around 2GB.
- I then moved the
proc
variable of the PySaxonProcessor
object outside of the function the memory usage went down to 800MB.
See my modified code below:
import os, psutil
from saxoncee import *
print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)
proc = PySaxonProcessor()
xslt_processi = proc.new_xslt30_processor()
def xslt_process(xslt_process, xsl_file_path, dom):
# xslt_process = proc.new_xslt30_processor()
document = proc.parse_xml(xml_text=dom) # Load the XML document
stylesheet = xslt_process.compile_stylesheet(stylesheet_file=xsl_file_path) # Compile the XSLT stylesheet
# Apply the transformation
input_xslt = stylesheet.transform_to_string(xdm_node=document)
# cleaning string which we got from XSLT
input_xslt = ''.join(input_xslt.split("\n")).replace("</output1>", '')
input_xslt = input_xslt.replace("<output1>", '').strip()
input_xslt = input_xslt.replace('<?xml version="1.0" encoding="UTF-8"?>', '').strip()
input_xslt = [x.strip() for x in input_xslt.split(",")]
print(input_xslt)
xsl_str = '''<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='3.0'> <xsl:param name='values' select='(2,3,4)' /><xsl:output method='xml'
indent='yes' /><xsl:template match='*'><output><xsl:value-of select='//person[1]'/><xsl:for-each select='$values' ><out><xsl:value-of select='. *
3'/></out></xsl:for-each></output></xsl:template></xsl:stylesheet>'''
with open('test.xsl', "w") as xsl_file:
xsl_file.write(xsl_str.strip())
dom = "<doc><item>text1</item><item>text2</item><item>text3</item></doc>"
file = "test.xsl"
for i in range(10000):
print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)
xslt_process(xslt_processi, file, dom)
print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)
print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)
Still investigating why the memory is not going down.
To run with psutil you will need:
pip install psutil
Update:
I think I have pinned down the cause of the memory leak. When I reuse the PyXsltExecutable
(which I know you do not want to do) the memory ends around 67MB. This is much better and what I would expect. Also as a side note I am not 100% sure of Python's scheme for its management of the garbage collection, but I will look into it in more detail.
But I have a suspicion that there is a memory leak in the PyXsltExecutable
class that will need investigating.
- % Done changed from 0 to 80
just to confirm the problem is the Java object for the XsltExecutable is not been released when it is finished with therefore it cannot be garbage collected. I have applied a patch in the C++ code: Specifically the destructor of the XsltExecutable.
I ran the example code above (from comment #4) and we are now using 60MB at the end of the python script.
I will do some more experiments before I mark this bug issue as resolved
- Status changed from In Progress to Resolved
- % Done changed from 80 to 100
Bug fixed and available for the next maintenance release.
- Status changed from Resolved to Closed
- Fixed in version set to 12.4.2
Fix applied in SaxonC 12.4.2 maintenance release
Please register to edit this issue
Also available in: Atom
PDF