Project

Profile

Help

XSLT 3.0 non XML result documents: should the Python API allow to capture them serialized?

Added by Martin Honnen almost 2 years ago

I am continuing to explore the SaxonC 12 Python API, for XSLT I wonder whether there is a way to capture non XML result documents containing XDM maps/arrays (aka JSON) in a serialized form e.g. a string.

While the primary result of e.g.

from saxonche import *

with PySaxonProcessor(license=True) as proc:
    print(proc.version)

    xslt = '''<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0" expand-text="yes">
    <xsl:output method="json" indent="yes"/>
    <xsl:template name="xsl:initial-template">
      <xsl:sequence select="map { 'name' : 'Example 1', 'data' : array { 1 to 5 } }"/>
    </xsl:template>
</xsl:stylesheet>'''

    xslt_proc = proc.new_xslt30_processor()

    xslt_exe = xslt_proc.compile_stylesheet(stylesheet_text=xslt)

    result = xslt_exe.call_template_returning_string()

    print(result)

is returned serialized as a JSON string fine, i.e. the example code outputs

SaxonC-HE 12.0 from Saxonica
{
  "data": [ 1, 2, 3, 4, 5 ],
  "name": "Example 1"
}

when I try to do the same for secondary result documents created with xsl:result-document I seem to run into an error and only find that I can change my code to capture the raw result (but where I then lack a way in the Python API to serialize it as e.g. JSON):

So the following code

from saxonche import *

with PySaxonProcessor(license=True) as proc:
    print(proc.version)

    xslt = '''<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0" expand-text="yes">
        <xsl:output method="json" indent="yes"/>
        <xsl:template name="xsl:initial-template">
          <xsl:sequence select="map { 'name' : 'Example 1', 'data' : array { 1 to 5 } }"/>
          <xsl:for-each select="1 to 5">
          <xsl:result-document href="json-result-{.}.json">
              <xsl:sequence
                select="array { (1 to .) ! map { 'name' : 'item ' || ., 'data' : array { 1 to . } } }"/>
            </xsl:result-document>
          </xsl:for-each>
        </xsl:template>
    </xsl:stylesheet>'''

    xslt_proc = proc.new_xslt30_processor()

    xslt_exe = xslt_proc.compile_stylesheet(stylesheet_text=xslt)

    xslt_exe.set_base_output_uri("urn:to-string")

    xslt_exe.set_capture_result_documents(True, False)

    result = xslt_exe.call_template_returning_string()

    print(result)

    result_docs = xslt_exe.get_result_documents()

    for key in result_docs:
        print(key, result_docs[key])

does neither return the primary result in result nor does it give me any serialized secondary result documents, instead it outputs an error

SaxonC-HE 12.0 from Saxonica
None
json-result-1.json 
Error in xsl:result-document/@href on line 6 column 60 
  SENR0001  Cannot serialize a map using this output method
at template xsl:initial-template on line 3 column 51

Now when I switch to "raw" results by changing e.g. xslt_exe.set_capture_result_documents(True, False) to xslt_exe.set_capture_result_documents(True, True) I get an output where the primary result is serialized as JSON but the secondary result documents as returned as arrays and I lack a way in the Python API to serialize them as JSON (their toString() representation of course use the adaptive output):

SaxonC-HE 12.0 from Saxonica
{
  "data": [ 1, 2, 3, 4, 5 ],
  "name": "Example 1"
}
json-result-1.json [map{"data":[1],"name":"item 1"}]
json-result-2.json [map{"data":[1],"name":"item 1"},map{"data":[1,2],"name":"item 2"}]
json-result-3.json [map{"data":[1],"name":"item 1"},map{"data":[1,2],"name":"item 2"},map{"data":[1,2,3],"name":"item 3"}]
json-result-4.json [map{"data":[1],"name":"item 1"},map{"data":[1,2],"name":"item 2"},map{"data":[1,2,3],"name":"item 3"},map{"data":[1,2,3,4],"name":"item 4"}]
json-result-5.json [map{"data":[1],"name":"item 1"},map{"data":[1,2],"name":"item 2"},map{"data":[1,2,3],"name":"item 3"},map{"data":[1,2,3,4],"name":"item 4"},map{"data":[1,2,3,4,5],"name":"item 5"}]

I am not sure whether the error I get ("SENR0001 Cannot serialize a map using this output method") is a shortcoming of SaxonC as the result of the interaction between Java and C++ and Python or whether it is a quirk/bug in the current implementation.


Replies (6)

Please register to reply

RE: XSLT 3.0 non XML result documents: should the Python API allow to capture them serialized? - Added by O'Neil Delpratt almost 2 years ago

Thank you Martin for your experiments on this issue. Maybe something like a serializeAsJson(XdmMap) as a new method would do. Ideally we should be inline with the Java API so I will investigate this further.

RE: XSLT 3.0 non XML result documents: should the Python API allow to capture them serialized? - Added by O'Neil Delpratt almost 2 years ago

Hi, taking a step back here. See below the code ported to Java:

        Processor processor = new Processor(false);

        StringReader reader = new StringReader("<xsl:stylesheet xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\" version=\"3.0\" expand-text=\"yes\">\n" +
                "        <xsl:output method=\"json\" indent=\"yes\"/>\n" +
                "        <xsl:template name=\"xsl:initial-template\">\n" +
                "          <xsl:sequence select=\"map { 'name' : 'Example 1', 'data' : array { 1 to 5 } }\"/>\n" +
                "          <xsl:for-each select=\"1 to 5\">\n" +
                "          <xsl:result-document href=\"json-result-{.}.json\">\n" +
                "              <xsl:sequence\n" +
                "                select=\"array { (1 to .) ! map { 'name' : 'item ' || ., 'data' : array { 1 to . } } }\"/>\n" +
                "            </xsl:result-document>\n" +
                "          </xsl:for-each>\n" +
                "        </xsl:template>\n" +
                "    </xsl:stylesheet>");

        XsltCompiler compiler = processor.newXsltCompiler();

        Xslt30Transformer trans = compiler.compile(new StreamSource(reader)).load30();
         ResultHandler resultHandler = new ResultHandler(true);
         trans.setResultDocumentHandler(resultHandler);

        XdmValue value = trans.callTemplate(null);

        System.err.println("Primary doc: " + value.toString());


        XdmValue [] rawResults = resultHandler.getRawResults();

        int i=0;
        for(XdmValue valuei : rawResults) {
            System.err.println("Secondary results["+i+"] :" + valuei.toString());
            i++;
        }

We get the following output:

Primary doc: map{"data":[1,2,3,4,5],"name":"Example 1"}
Secondary results[0] :[map{"data":[1],"name":"item 1"}]
Secondary results[1] :[map{"data":[1],"name":"item 1"},map{"data":[1,2],"name":"item 2"},map{"data":[1,2,3],"name":"item 3"},map{"data":[1,2,3,4],"name":"item 4"}]
Secondary results[2] :[map{"data":[1],"name":"item 1"},map{"data":[1,2],"name":"item 2"},map{"data":[1,2,3],"name":"item 3"}]
Secondary results[3] :[map{"data":[1],"name":"item 1"},map{"data":[1,2],"name":"item 2"},map{"data":[1,2,3],"name":"item 3"},map{"data":[1,2,3,4],"name":"item 4"},map{"data":[1,2,3,4,5],"name":"item 5"}]
Secondary results[4] :[map{"data":[1],"name":"item 1"},map{"data":[1,2],"name":"item 2"}]

RE: XSLT 3.0 non XML result documents: should the Python API allow to capture them serialized? - Added by O'Neil Delpratt almost 2 years ago

So Java is doing the same thing as the Python code. In Java I guess I could create another map for each secondary result and serialize it to string which is the JSON you want. I think you can do the same in Python too.

RE: XSLT 3.0 non XML result documents: should the Python API allow to capture them serialized? - Added by O'Neil Delpratt almost 2 years ago

Or if you want the individual XdmArray item I can just traverse them in a loop and call toString on the XdmMap objects, which would be JSON.

RE: XSLT 3.0 non XML result documents: should the Python API allow to capture them serialized? - Added by Martin Honnen almost 2 years ago

The question I pondered was how to get the secondary results as a serialized string, on the Java side I can easily for the secondary result/the result handler use a Serializer, on the Python side I don't have that option and I don't have much of an API to serialize the raw XDM results with various options other than, as I have figured now, to call the XPath 3.1 serialize function. For the time being, what I have done, is, instead of using the Python API for XSLT, I have delegated the task to XPath and the fn:transform function with delivery-format : 'serialized, that gives me a map with all result documents in the same form of a string if I need/want that, or I could use raw` to have an XDM value.

I can't currently tell whether it is feasible to give the C++/Python API something alike a Serializer as the result handler, I guess that might not be possible.

RE: XSLT 3.0 non XML result documents: should the Python API allow to capture them serialized? - Added by Martin Honnen almost 2 years ago

I can live with the double True for capture result documents and store raw results giving me an XDM value, the question is whether the API can be improved/extended or perhaps changed for the case of capturing the results but not wanting a raw value (but perhaps rather a serialized result according to what is attribute on the xsl:result-document); in that case, as I stated in my post, if a secondary result is an XDM map or array, the current API (I think the code (for 11) is in https://saxonica.plan.io/projects/saxonmirrorhe/repository/he/revisions/he_mirror_saxon_11_4/entry/src/main/java/net/sf/saxon/option/cpp/SaxonCResultDocumentHandler.java#L53) uses an XdmDestination that throws an error for the XDM map or array results and that is changeable on the Java side by designing a Serializer but with SaxonC/Python I am kind of stuck with the error because under the hood the API uses either RawDestination (which can handle any result but ignores serialization attributes/properties) or XdmDestination (which also ignores serialization attributes/properties but can't handle results that are arrays or maps and throws an error on them).

So at that point I wonder whether for all the returning_string methods in the Python API it would make sense to have perhaps a third argument to set_capture_result_documents to allow me to say serialize or serialized and under the hood the C++/Java code would then not use an XdmDestination but a Serializer over a StringWriter and return serialized result documents.

As I said, I can't judge how feasible that is and I am kind of just exploring what can currently be done and what can't and commenting on what can't be done.

And as I said, I have found a workaround to simply rely on fn:transform and its delivery-format: serialized, if some day the system function call gives me a nice error/exception if something goes wrong there (is there a bug for https://saxonica.plan.io/boards/4/topics/9235?) I perhaps don't need the serialized option for the capture result documents API of the XSLT API and can live with fn:transform.

    (1-6/6)

    Please register to reply