Project

Profile

Help

Support #4428

Multi-threading support of Python bindings Saxon-C

Added by Andreas Jung 7 months ago. Updated 6 months ago.

Status:
In Progress
Priority:
Normal
Category:
-
Start date:
2020-01-14
Due date:
% Done:

0%

Estimated time:
Found in version:

Description

Question regarding the Python bindings: are the Python bindings thread-safe?

History

#1 Updated by Michael Kay 7 months ago

  • Project changed from Saxon to Saxon/C
  • Assignee set to O'Neil Delpratt

#2 Updated by O'Neil Delpratt 7 months ago

  • Status changed from New to AwaitingInfo

Saxon/C has been cross compiled using Excelsior JET. The runtime maps Java threads directly onto native operating system threads.

Saxon/C is designed to support common scenarios like compiling a stylesheet once and then using it repeatedly, in multiple threads, to perform transformations.

Would you be able to give more detail what you mean by a "thread-safe"? For instance in Saxon the xsl:result-document instruction can be executed in multiple threads. See: configuration feature ALLOWING_MULTITHREADING and its use in the XSLT instruction xsl:result-document

#3 Updated by Andreas Jung 7 months ago

The question is about thread safety within Python. E.g. it is common that Python web application or Python web frameworks use threads as worker model for processing requests. The question is about if it is safe to use Saxon-C through its Python bindings in a multi-threaded Python application. As part of concurrent web requests there might be situation that two threads process different XML data at the same time. This boils down to the question if there is a global state somewhere in Saxon-C or it's bindings that would require locking.

#4 Updated by O'Neil Delpratt 7 months ago

  • Status changed from AwaitingInfo to In Progress
  • Priority changed from Low to Normal

There is no global state in Saxon/C or its bindings.

Reading documents is thread safe.

However the Processors for Xslt, XQuery and XPath are not thread safe as they currently hold internal state. We are treating this as a bug and will be looking to make some changes in its design to make them thread safe.

Is it possible to send us a sample Python web application with threads which we can use please.

#5 Updated by Andreas Jung 7 months ago

Thanks for the information.

My own XML CMS platform xml-director.info is based on Python 3 and Plone 5.2 CMS and the typical out of the box configuration of webserver stack is based on Python threads. Nowadays we also have other options like a fork-worker model.

My interest with Saxon-C comes from the now available most decent XML processing capabilities that we have been missing in Python 2+3 for many, many years due to the limitations of libxml2. So for future project I am happy to see state-of-art XML processing capabilities in upcoming projects.

Case closed...

#6 Updated by O'Neil Delpratt 6 months ago

Update:

I have made changes to the Java code to make the Xslt and XQuery processors thread-safe. Specifically I have made them stateless. In the C++ code we have have added in the class SaxonProcessor the methods 'attachThread' and 'detachThread' for JNI purposes. At the start of creating a new thread in C++ we have to call attachThread and at the end detachThread. This is working progress so this API design might change.

C++ test code below. Here we compile the stylesheet once and reuse it in a number of threads to execute it against a source document concurrently:

void *RunThread(void *args) {

    struct arg_struct *argsi = (struct arg_struct *)args;
    int threadid = argsi->id;
    Xslt30Processor * trans = argsi->trans;
    long tid;
    tid = (long)threadid;

    trans->attachThread();
 
   trans->setInitialMatchSelectionAsFile("../xml/foo.xml");
    
    const char *result = trans->applyTemplatesReturningString();
    cout<<" Result from THREAD ID: "<< tid << ", " << result<<endl;
    delete result;
    trans->detachThread();
}

void testThreads (SaxonProcessor * processor) {
    pthread_t threads[NUM_THREADS];
    int rc;
    int i;
    
    Xslt30Processor *  trans = processor->newXslt30Processor();
    
    trans->compileFromFile("../xsl/foo.xsl");
    struct arg_struct args;
    args.trans = trans;
    
    for( i = 0; i < NUM_THREADS; i++ ) {
        cout << "main() : creating thread, " << i << endl;
        args.id = i;
        rc = pthread_create(&threads[i], NULL, RunThread, (void *)&args);
        
        if (rc) {
            cout << "Error:unable to create thread," << rc << endl;
            exit(-1);
        }
    }
}

The C++ code is crashing with segmentation fault in different ways between the runs, so currently investigating these errors.

For python I have setup in anaconda a django web framework, which I will use next to do some multi-threading testing.

#7 Updated by O'Neil Delpratt 6 months ago

Added join method to prevent the main current thread ending before other threads by using:

(void) pthread_join(threads[i], NULL);

The C++ multithreading test case is now working without any errors. I still need to do some more experiments and move on to some python testing.

Please register to edit this issue

Also available in: Atom PDF