Project

Profile

Help

Is there some way to use ThreadPoolExecutor with Python API and ensure that detach_current_thread is propertly called?

Added by Martin Honnen about 2 years ago

I am looking into multi-threading with Python and SaxonC (testing with 11.3 HE) again and I am wondering whether there is a way to use a ThreadPoolExecutor and ensure the detach_current_thread is callled properly?

My test code is

from saxonc import *
import glob
from concurrent.futures import ThreadPoolExecutor

xsltExecutable = None

def run_transform(input_file):
    result_file = input_file.replace('input-samples', 'python-output-samples')
    print('Transforming file', input_file, 'to file', result_file)
            
    xsltExecutable.transform_to_file(source_file = input_file, output_file = result_file)
        
    
with PySaxonProcessor(license = False) as saxon:
    print(saxon.version)
    
    input_files = glob.glob('input-samples/*')
    
    print(input_files)
    
    xslt30Processor = saxon.new_xslt30_processor()
    
    xsltExecutable = xslt30Processor.compile_stylesheet(stylesheet_file = 'transform-file.xsl')
    
    xsltExecutable.set_cwd('.')
    
    with ThreadPoolExecutor(max_workers = 4) as executor:
        executor.map(run_transform, input_files)

This seems to process and transform all files just fine but then Python/SaxonC crashes and core dumps with e.g.

JET RUNTIME HAS DETECTED UNRECOVERABLE ERROR: runtime error
Thread 3128 ["Thread-1"] is terminated without notifying the JVM. Probably, "DetachCurrentThread" function was not called

At which point would I need to inject the saxon.detach_current_thread to avoid the crash and core dump?


Replies (2)

RE: Is there some way to use ThreadPoolExecutor with Python API and ensure that detach_current_thread is propertly called? - Added by O'Neil Delpratt about 2 years ago

I think you need to find a way to call detach_current_thread() at the end of run_transform(). Is it possible to pass the PySaxonProcessor as an argument?

The following works for me:

from saxonc import *
import glob
from concurrent.futures import ThreadPoolExecutor

xsltExecutable = None
saxon =  None
def run_transform(input_file):
    result_file = input_file.replace('input-samples', 'python-output-samples')
    print('Transforming file', input_file, 'to file', result_file)
            
    xsltExecutable.transform_to_file(source_file = input_file, output_file = result_file)
    saxon.detach_current_thread()
        
    

saxon = PySaxonProcessor(license = False)
print(saxon.version)
    
input_files = glob.glob('../../samples/data/*')
    
print(input_files)
    
xslt30Processor = saxon.new_xslt30_processor()
    
xsltExecutable = xslt30Processor.compile_stylesheet(stylesheet_file = 'transform-file.xsl')
    
xsltExecutable.set_cwd('.')
    
with ThreadPoolExecutor(max_workers = 4) as executor:
    executor.map(run_transform, input_files)

RE: Is there some way to use ThreadPoolExecutor with Python API and ensure that detach_current_thread is propertly called? - Added by Martin Honnen about 2 years ago

Thanks for the suggestion, indeed a global variable for the Saxon processor helps and then just calling detach_current_thread each time in run_transform. I had thought, that, due to the reuse of threads in a thread pool, that would somehow detach threads too often, but it looks as if it at least doesn't give any errors and doesn't crash jet, so it seems that is a way.

As for passing arguments to map, I will need to check whether Python allows some kind of closure to do that.

    (1-2/2)

    Please register to reply