Project

Profile

Help

Bug #6391

closed

Bug #6392: Memory leak in PySaxonProcessor (Python)

Possible memory leak in Python processor?

Added by Norm Tovey-Walsh 8 months ago. Updated 5 months ago.

Status:
Closed
Priority:
Low
Category:
-
Start date:
2024-04-14
Due date:
% Done:

0%

Estimated time:
Applies to branch:
Fix Committed on Branch:
12
Fixed in Maintenance Release:
Found in version:
Fixed in version:
12.5
SaxonC Languages:
SaxonC Platforms:
SaxonC Architecture:

Description

On StackOverflow, Rainbolt reports:

Python: 3.11 Saxonche: 12.4.2

My website keeps consuming more and more memory until the server runs out of memory and crashes. I isolated the problematic code to the following script:

import gc
from time import sleep

from saxonche import PySaxonProcessor


xml_str = """
<root>
    <stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
    <stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
    <stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
</root>
"""

while True:
    print('Running once...')
    with PySaxonProcessor(license=False) as proc:
        proc.parse_xml(xml_text=xml_str)

    gc.collect()
    sleep(1)

This script consumes memory at a rate of about 0.5 MB per second. The memory usage does not plateau after a while. I have logs showing that memory usage continues to grow for hours until the server runs out of memory and crashes.

Actions #1

Updated by John Rainbolt 8 months ago

Some folks suggested on the linked StackOverflow post that moving the PySaxonProcessor out of the loop fixes the leak. So I tested it:

with PySaxonProcessor(license=False) as proc:
    while True:
        print('Running once...')
        proc.parse_xml(xml_text=xml_str)

        gc.collect()
        sleep(1)

It still leaks.

I also tested with different sizes of input XML, and found that the rate at which the code leaks is proportional to the size of the input XML. For a 10 MB file, the code leaks about 10 MB per iteration. For a 20 MB file, the code leaks about 20 MB per iteration.

On the other hand, instantiating the PySaxonProcessor seems to cost a static 200 KB each time. So yes, moving the instantiation out of the loop helps slightly, but when you are leaking 20 MB per iteration, saving 200 KB hardly helps.

Actions #2

Updated by Norm Tovey-Walsh 8 months ago

  • Status changed from New to Duplicate

This is a duplicate of #6392. Let's move the discussion there. (In principle, I suppose #6392 is the duplicate, but since that one was created by an external user, I'm going to mark this one the duplicate.)

Actions #3

Updated by Matt Patterson 8 months ago

  • Status changed from Duplicate to In Progress
  • Assignee set to Matt Patterson

I think this is actually not a dupe of #6391. There seem to be two separate issues - retaining memory to do with a processor, and this one, probably to do with holding a copy of the input string when parsing.

Actions #4

Updated by Matt Patterson 6 months ago

  • Status changed from In Progress to Resolved
  • Fixed in version set to 12.5.0

This has been a bit tricky to track down, largely because it looks like we had fixed it as a side-effect of making more general changes after 12.4.2 released, and before this was reported.

After more extensive testing using memray and a debug-enabled build, I can't see any evidence of memory leaks in the current build, which I was able to see when build a debug-enabled version of 12.4.

Actions #5

Updated by Matt Patterson 6 months ago

  • Parent issue set to #6392
Actions #6

Updated by O'Neil Delpratt 5 months ago

  • Status changed from Resolved to Closed
  • Fixed in version changed from 12.5.0 to 12.5

Bug fix applied in the Saxon 12.5 Maintenance release.

Actions #7

Updated by Community Admin 5 months ago

  • Fix Committed on Branch 12 added

Please register to edit this issue

Also available in: Atom PDF