Project

Profile

Help

Is PyXdmNode meant to be thread-safe?

Added by Martin Honnen almost 2 years ago

I am wondering whether can build a single PyXdmNode once with SaxonC and parse_xml to be used as the same input to various XsltExecutables in different threads?

I tried the code sample below and don't get any errors but also no output files:

import threading

from saxonc import *

class myThread (threading.Thread):
    def __init__(self, threadID, name, counter, node, saxon_proc):
        threading.Thread.__init__(self)
        self.threadID = threadID
        self.counter = counter
        self.name = name
        self.node = node
        self.saxon_proc = saxon_proc
        self.xslt30_processor = saxon_proc.new_xslt30_processor()
        self.xslt30_processor.set_cwd('.')

    def run(self):
        print ("Starting " + self.name)
        run_transform(self.name, self.counter, self.node, self.xslt30_processor, self.saxon_proc)
        print ("Exiting " + self.name)

def run_transform(threadName, counter, node, xslt30_processor, saxon_proc):
    sheet_file = "sheet-samples/sheet{}.xsl".format(counter)
    result_file = "threading-example/result-{}.xml".format(counter)
    print('Transforming with', sheet_file, 'to', result_file)
    xslt30_processor.transform_to_file(xdm_node = node, stylesheet_file = sheet_file, output_file = result_file)
    print(xslt30_processor.error_message)
    saxon_proc.detach_current_thread


with PySaxonProcessor(license = False) as saxon_proc:

    xdm_node = saxon_proc.parse_xml(xml_file_name = 'input-samples/sample-1.xml')
    
    # Create new threads
    thread1 = myThread(1, "Thread-1", 1, xdm_node, saxon_proc)
    thread2 = myThread(2, "Thread-2", 2, xdm_node, saxon_proc)
    thread3 = myThread(3, "Thread-3", 3, xdm_node, saxon_proc)
    
    # Start new Threads
    thread1.start()
    thread2.start()
    thread3.start()
    thread1.join()
    thread2.join()
    thread3.join()
    print ("Exiting Main Thread")

Replies (9)

Please register to reply

RE: Is PyXdmNode meant to be thread-safe? - Added by Martin Honnen almost 2 years ago

So my previous attempt using transform_to_file seems to have been doomed by that method ignoring xdm_node.

Therefore I have changed to use apply_templates_returning_file(xdm_value=... e.g.

import threading

from saxonc import *

class myThread (threading.Thread):
    def __init__(self, threadID, name, counter, node, saxon_proc):
        threading.Thread.__init__(self)
        self.threadID = threadID
        self.counter = counter
        self.name = name
        self.node = node
        self.saxon_proc = saxon_proc
        self.xslt30_processor = saxon_proc.new_xslt30_processor()
        self.xslt30_processor.set_cwd('.')

    def run(self):
        print ("Starting " + self.name)
        run_transform(self.name, self.counter, self.node, self.xslt30_processor, self.saxon_proc)
        print ("Exiting " + self.name)

def run_transform(threadName, counter, node, xslt30_processor, saxon_proc):
    sheet_file = "sheet-samples/sheet{}.xsl".format(counter)
    result_file = "threading-example/result-{}.xml".format(counter)
    print('Transforming with', sheet_file, 'to', result_file)
    xslt_executable = xslt30_processor.compile_stylesheet(stylesheet_file = sheet_file)
    xslt_executable.apply_templates_returning_file(xdm_value = node, output_file = result_file)
    print(xslt_executable.error_message)
    saxon_proc.detach_current_thread


with PySaxonProcessor(license = False) as saxon_proc:

    xdm_node = saxon_proc.parse_xml(xml_file_name = 'input-samples/sample-1.xml')
    
    # Create new threads
    thread1 = myThread(1, "Thread-1", 1, xdm_node, saxon_proc)
    thread2 = myThread(2, "Thread-2", 2, xdm_node, saxon_proc)
    thread3 = myThread(3, "Thread-3", 3, xdm_node, saxon_proc)
    
    # Start new Threads
    thread1.start()
    thread2.start()
    thread3.start()
    thread1.join()
    thread2.join()
    thread3.join()
    print ("Exiting Main Thread")

This runs the first transformation fine it seems but then dies on the second with a core dump:

Starting Thread-1
Transforming with sheet-samples/sheet1.xsl to threading-example/result-1.xml
Starting Thread-2
Transforming with sheet-samples/sheet2.xsl to threading-example/result-2.xml

JET RUNTIME HAS DETECTED UNRECOVERABLE ERROR: system exception at 0x0000000000a6730e

JET RUNTIME HAS DETECTED UNRECOVERABLE ERROR: system exception at 0x0000000000a6730e
Please, contact the vendor of the application.
Crash dump will be written to "C:\SomePath\SomeDir\jet_dump_31972.dmp"

Exception 0xC0000005 (EXCEPTION_ACCESS_VIOLATION) at 0x0000000000a6730e (C:\Program Files\Saxonica\SaxonC HE 11.3\libsaxonhec.dll+0x66730e)
Failed to read memory at 0x000000440fbf0000

Is that due to using the same PyXdmNode in different threads (Mike says on the Java side with the default tiny tree XdmNode is thread-safe) or due to other reasons?

RE: Is PyXdmNode meant to be thread-safe? - Added by O'Neil Delpratt almost 2 years ago

Hi Martin,

Please can you send me the data files you are using. Thanks

RE: Is PyXdmNode meant to be thread-safe? - Added by Martin Honnen almost 2 years ago

I simply tried any input fed to some identity transformation adding some comment about sheet and time, see the attached zip.

RE: Is PyXdmNode meant to be thread-safe? - Added by O'Neil Delpratt almost 2 years ago

Thanks for sending the zip file. For some strange reason I am getting the follow error:

AttributeError: 'NoneType' object has no attribute 'apply_templates_returning_file'

This means the stylesheet compilation is failing, but I don't know why.

RE: Is PyXdmNode meant to be thread-safe? - Added by Martin Honnen almost 2 years ago

Perhaps the unzipping went wrong and the XSLT to be compiled is not there or corrupted?

But I don't know, perhaps it is also a Windows versus Linux problem, the further simplified code in the newly attached zip does run and output stuff on Windows before crashing jet while under Linux I get an error I don't understand either, similar to yours:

Starting Thread-1
Transforming with sheet1.xsl to threading-example/result-1.xml
Starting Thread-2
Transforming with sheet2.xsl to threading-example/result-2.xml
None
Exception in thread Thread-1:
Starting Thread-3
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
Transforming with sheet3.xsl to threading-example/result-3.xml
  File "./test1.py", line 18, in run
None
    run_transform(self.name, self.counter, self.node, self.xslt30_processor, self.saxon_proc)
  File "./test1.py", line 28, in run_transform
None
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
Exception in thread Thread-3:
Traceback (most recent call last):
    xslt_executable.apply_templates_returning_file(xdm_value = node, output_file = result_file)
    self.run()
  File "./test1.py", line 18, in run
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
AttributeError: 'NoneType' object has no attribute 'apply_templates_returning_file'
    self.run()
    run_transform(self.name, self.counter, self.node, self.xslt30_processor, self.saxon_proc)
  File "./test1.py", line 18, in run
  File "./test1.py", line 28, in run_transform
    run_transform(self.name, self.counter, self.node, self.xslt30_processor, self.saxon_proc)
  File "./test1.py", line 28, in run_transform
    xslt_executable.apply_templates_returning_file(xdm_value = node, output_file = result_file)
    xslt_executable.apply_templates_returning_file(xdm_value = node, output_file = result_file)
AttributeError: 'NoneType' object has no attribute 'apply_templates_returning_file'
AttributeError: 'NoneType' object has no attribute 'apply_templates_returning_file'
Exiting Main Thread

JET RUNTIME HAS DETECTED UNRECOVERABLE ERROR: runtime error
Thread 2786 ["Thread-1"] is terminated without notifying the JVM. Probably, "DetachCurrentThread" function was not called

RE: Is PyXdmNode meant to be thread-safe? - Added by Martin Honnen almost 2 years ago

As far as I can see, even on Linux one transformation runs through and creates a result, the other two give that exception as if the execution of one thread corrupted the data of the other ones. Why that happens on Linux and not the same way on Windows is something I can't explain.

RE: Is PyXdmNode meant to be thread-safe? - Added by Martin Honnen almost 2 years ago

As a sanitity check, that the code is doing the right thing without using threading, I run

#import threading

from saxonc import *

class myThread ():
    def __init__(self, threadID, name, counter, node, saxon_proc):
        #threading.Thread.__init__(self)
        self.threadID = threadID
        self.counter = counter
        self.name = name
        self.node = node
        self.saxon_proc = saxon_proc
        self.xslt30_processor = saxon_proc.new_xslt30_processor()
        self.xslt30_processor.set_cwd('.')

    def start(self):
        self.run()
        
    def run(self):
        print ("Starting " + self.name)
        run_transform(self.name, self.counter, self.node, self.xslt30_processor, self.saxon_proc)
        print ("Exiting " + self.name)
        
    def join(self):
        return
        

def run_transform(threadName, counter, node, xslt30_processor, saxon_proc):
    sheet_file = "sheet{}.xsl".format(counter)
    result_file = "threading-example/result-{}.xml".format(counter)
    print('Transforming with', sheet_file, 'to', result_file)
    xslt30_processor.set_cwd('.')
    xslt_executable = xslt30_processor.compile_stylesheet(stylesheet_file = sheet_file)
    print('Error after compiling:', xslt30_processor.error_message)
    xslt_executable.apply_templates_returning_file(xdm_value = node, output_file = result_file)
    print('Error after applying templates', xslt_executable.error_message)
    saxon_proc.detach_current_thread


with PySaxonProcessor(license = False) as saxon_proc:

    xdm_node = saxon_proc.parse_xml(xml_file_name = 'sample-1.xml')
    
    # Create new threads
    thread1 = myThread(1, "Thread-1", 1, xdm_node, saxon_proc)
    thread2 = myThread(2, "Thread-2", 2, xdm_node, saxon_proc)
    thread3 = myThread(3, "Thread-3", 3, xdm_node, saxon_proc)
    
    # Start new Threads
    thread1.start()
    thread2.start()
    thread3.start()
    thread1.join()
    thread2.join()
    thread3.join()
    print ("Exiting Main Thread")

and then indeed all transformations run through fine.

So it looks like using threading messes up the Saxon state of the supposedly thead separated variables.

RE: Is PyXdmNode meant to be thread-safe? - Added by Michael Lisitsa over 1 year ago

I was able to create a temporary XML file from a string and refer to its path using source_file as a workaround to the apparently non thread-safe PyXdmNode. Here is a GitHub issue that describes the approach of using tempfiles python package https://github.com/PyFilesystem/pyfilesystem2/issues/402

Has there been any progress on a way to use the Xdm_node as an argument in multiple threads to a single PyXsltExecutable?

import os
import tempfile
import saxonc

# Stylesheet compiled once at startup of a FastAPI server:
proc = saxonc.PySaxonProcessor(license = False)
xsltproc = proc.new_xslt30_processor()
executable = xsltproc.compile_stylesheet(...)

def request_handler_runs_in_multiple_threads(xml_string):
   # delete=False ensures file persists after .close() is called, which is necessary, 
   # Python may still be internally buffering the data (per above thread)
    tmp_file=tempfile.NamedTemporaryFile(mode="w",suffix=".xml",prefix="myname", delete=False)
    tmp_file.write("xml_string")
    tmp_file.close()
    output = executable.apply_templates_returning_string(source_file=tmp_file.name)
    os.unlink(tmp_file.name)

RE: Is PyXdmNode meant to be thread-safe? - Added by O'Neil Delpratt over 1 year ago

Apologies I have dropped the ball on this forum post. I will create a bug issue and investigate this further.

    (1-9/9)

    Please register to reply