Project

Profile

Help

Bug #6391

open

Possible memory leak in Python processor?

Added by Norm Tovey-Walsh 16 days ago. Updated 15 days ago.

Status:
In Progress
Priority:
Low
Category:
-
Start date:
2024-04-14
Due date:
% Done:

0%

Estimated time:
Found in version:
Fixed in version:
Platforms:

Description

On StackOverflow, Rainbolt reports:

Python: 3.11 Saxonche: 12.4.2

My website keeps consuming more and more memory until the server runs out of memory and crashes. I isolated the problematic code to the following script:

import gc
from time import sleep

from saxonche import PySaxonProcessor


xml_str = """
<root>
    <stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
    <stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
    <stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
</root>
"""

while True:
    print('Running once...')
    with PySaxonProcessor(license=False) as proc:
        proc.parse_xml(xml_text=xml_str)

    gc.collect()
    sleep(1)

This script consumes memory at a rate of about 0.5 MB per second. The memory usage does not plateau after a while. I have logs showing that memory usage continues to grow for hours until the server runs out of memory and crashes.

Actions #1

Updated by John Rainbolt 15 days ago

Some folks suggested on the linked StackOverflow post that moving the PySaxonProcessor out of the loop fixes the leak. So I tested it:

with PySaxonProcessor(license=False) as proc:
    while True:
        print('Running once...')
        proc.parse_xml(xml_text=xml_str)

        gc.collect()
        sleep(1)

It still leaks.

I also tested with different sizes of input XML, and found that the rate at which the code leaks is proportional to the size of the input XML. For a 10 MB file, the code leaks about 10 MB per iteration. For a 20 MB file, the code leaks about 20 MB per iteration.

On the other hand, instantiating the PySaxonProcessor seems to cost a static 200 KB each time. So yes, moving the instantiation out of the loop helps slightly, but when you are leaking 20 MB per iteration, saving 200 KB hardly helps.

Actions #2

Updated by Norm Tovey-Walsh 15 days ago

  • Status changed from New to Duplicate

This is a duplicate of #6392. Let's move the discussion there. (In principle, I suppose #6392 is the duplicate, but since that one was created by an external user, I'm going to mark this one the duplicate.)

Actions #3

Updated by Matt Patterson 15 days ago

  • Status changed from Duplicate to In Progress
  • Assignee set to Matt Patterson

I think this is actually not a dupe of #6391. There seem to be two separate issues - retaining memory to do with a processor, and this one, probably to do with holding a copy of the input string when parsing.

Please register to edit this issue

Also available in: Atom PDF