Extremely parallel implementation of SAXON 10?
Added by Markus Karg over 11 years ago
With the advent of (massive) multi-core CPUs and its further improved support in Java 8 SE, I see the chance to build a highly-autoscalable implementation of XSL processors. Using CompletableFuture (hence: Map-Reduce), Lambda expressions and Fork-Join-Executors (hence: work stealing), it will be rather simple possible to pass an XSL tree into the JRE, and the JRE would be able to split it up into a mesh of fork-joinable micro-actions, in turn distributing it on auto-balanced work queues. That would effectively use all existing free CPU cores to the max, resulting in the fastest ever possible XSL translation! I wonder whether this is actually planned for SAXON 10?
Replies (4)
Please register to reply
RE: Extremely parallel implementation of SAXON 10? - Added by Michael Kay over 11 years ago
As you may be aware, we have some limited support for parallel execution in Saxon 9.5. The xsl:for-each instruction has a saxon:threads attribute that allows multi-threaded execution; we would welcome feedback on how well this performs. Also, Saxon-EE 9.5 automatically has multi-threaded execution of xsl:result-document and the collection() function.
The problem with multi-threaded execution is of course that coordination between threads may cost more than the saving obtained by distributing the work. For example with xsl:for-each it is necessary to assemble the results of the multiple threads in the right order, which may involve buffering the results in memory in a case where single-threaded execution would emit results directly to the serializer. Also of course, the context information needs to be made thread-safe so that each thread has the correct values of things such as position() and local variables; there is also a need to synchronize the lazy evaluation of global variables.
Please experiment with the facilities already available in Saxon-EE 9.5 and provide feedback to help us increase the use of parallelism in future releases.
RE: Extremely parallel implementation of SAXON 10? - Added by Markus Karg over 11 years ago
Michael,
I think you misunderstood my question. The combination technologies mentioned in my question target automated optimization of workload among a rather small set of threads. These actually are the answer to the problems you correctly mention and will effectively lead to a huge performance boost compared to any kind of "manual" multi-threading as performed by saxon:threads! The trick is that the JVM itself decides (with a little help of a future SAXON version) whether to run an operation in a single block, split it up into multiple sub-operations, whether to run all of them on one core, or dynamically use several -- and this decision is done dynamically each time a sub-operation is done, so this is highly dynamic and adaptive. Also, sub-operations are non-blocking, so the algorithm will know when operations can be done on a single core or a second thread is of any use. It is completely different from simply using multiple threads in a static way for an iteration, and it will reduce context switches to the least effectively needed number. With this knowledge it makes no sense to Experiment with multi-threaded for-each, as this is a completely different technology and produces not nearly the throughput of the mentioned algorithms.
Regards -Markus
RE: Extremely parallel implementation of SAXON 10? - Added by Michael Kay over 11 years ago
Sounds like an interesting research project. I'm skeptical. But there's an open source version of Saxon available, and people are welcome to try things out and report the results.
RE: Extremely parallel implementation of SAXON 10? - Added by Markus Karg over 11 years ago
Yeah, if I'd be young again I'd really love to run such a research programme. ;-)
Maybe one day there'll be a fresh post-graduate volunteering at Saxonica keen in finding out the benefits of work-stealing, map-reduce and functional programming on the actual performance of SAXON. :-D
Good Night! -Markus
Please register to reply