Feature #6316
openMake PyXslt30Processor and pysaxonProcessor serializable/pickleable
0%
Description
Hello, I am working with Saxonche in a databricks/pyspark environment and when it comes to applying transformations to XMLs using saxonche, it is not making use of the parralel processing capabilities because the PyXslt30Processor are not "serializable" and thus it's impossible to use UDF etc to mass transform XMLs (i have multiple XMLs and one single XSLT to apply to all of them) The error I am getting when doing so is : Python process TypeError: no default reduce due to non-trivial cinit
would making those objects serializable possible ?
Thanks a lot for your time
Updated by O'Neil Delpratt 10 months ago
- Category set to Python API
Hi,
It should be possible to make the SaxonC classes which are defined in Cython, pickleable. From my reading given that we have redefined our cinit
method for each class we would need to define implementations of the methods __reduce__
, __getstate__
and __setstate__
to make them work. We will discuss this feature request with the team.
I have never used a databricks/pyspark environment, but wondering if you have a simple repo which we can test?
Updated by Youssef Bettayeb 10 months ago
Not sure what you mean by repo for databricks/pyspark environment as those are cloud solutions for which you need either to setup the environment on a cloud provider or maybe use pyspark in a Jupyter environment on a local machine. This link is a nice starting point https://towardsdatascience.com/how-to-use-pyspark-on-your-computer-9c7180075617 as for a code sample of how using SaxonC classes when they would be pickleable i published this snippet https://github.com/ybettayeb/sample-udf-saxonche/blob/main/sample.py
If you need anything else feel free
Sincerely,
Youssef.
Please register to edit this issue