Bug #6391
closedBug #6392: Memory leak in PySaxonProcessor (Python)
Possible memory leak in Python processor?
0%
Description
On StackOverflow, Rainbolt reports:
Python: 3.11 Saxonche: 12.4.2
My website keeps consuming more and more memory until the server runs out of memory and crashes. I isolated the problematic code to the following script:
import gc
from time import sleep
from saxonche import PySaxonProcessor
xml_str = """
<root>
<stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
<stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
<stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
</root>
"""
while True:
print('Running once...')
with PySaxonProcessor(license=False) as proc:
proc.parse_xml(xml_text=xml_str)
gc.collect()
sleep(1)
This script consumes memory at a rate of about 0.5 MB per second. The memory usage does not plateau after a while. I have logs showing that memory usage continues to grow for hours until the server runs out of memory and crashes.
Updated by John Rainbolt 8 months ago
Some folks suggested on the linked StackOverflow post that moving the PySaxonProcessor out of the loop fixes the leak. So I tested it:
with PySaxonProcessor(license=False) as proc:
while True:
print('Running once...')
proc.parse_xml(xml_text=xml_str)
gc.collect()
sleep(1)
It still leaks.
I also tested with different sizes of input XML, and found that the rate at which the code leaks is proportional to the size of the input XML. For a 10 MB file, the code leaks about 10 MB per iteration. For a 20 MB file, the code leaks about 20 MB per iteration.
On the other hand, instantiating the PySaxonProcessor seems to cost a static 200 KB each time. So yes, moving the instantiation out of the loop helps slightly, but when you are leaking 20 MB per iteration, saving 200 KB hardly helps.
Updated by Norm Tovey-Walsh 8 months ago
- Status changed from New to Duplicate
Updated by Matt Patterson 8 months ago
- Status changed from Duplicate to In Progress
- Assignee set to Matt Patterson
I think this is actually not a dupe of #6391. There seem to be two separate issues - retaining memory to do with a processor, and this one, probably to do with holding a copy of the input string when parsing.
Updated by Matt Patterson 6 months ago
- Status changed from In Progress to Resolved
- Fixed in version set to 12.5.0
This has been a bit tricky to track down, largely because it looks like we had fixed it as a side-effect of making more general changes after 12.4.2 released, and before this was reported.
After more extensive testing using memray
and a debug-enabled build, I can't see any evidence of memory leaks in the current build, which I was able to see when build a debug-enabled version of 12.4.
Updated by O'Neil Delpratt 5 months ago
- Status changed from Resolved to Closed
- Fixed in version changed from 12.5.0 to 12.5
Bug fix applied in the Saxon 12.5 Maintenance release.
Please register to edit this issue