Project

Profile

Help

Saxon-JS and merging multiple documents from http URIs

Added by Graydon Saunders about 1 year ago

Hello --

This is using Saxon-JS 2.5 inside node, rather than from the browser.

It looks like the pattern for ixsl:schedule-action to retrieve a file from an http URI -- per https://www.saxonica.com/saxon-js/documentation2/index.html#!ixsl-extension/instructions/schedule-action -- is always to write the retrieved file back out via xsl:result-document.

Is there a way to process the contents of multiple retrieved files into a single result document? https://www.balisage.net/Proceedings/vol25/html/Kay01/BalisageVol25-Kay01.html gives me hopeful thoughts about async="yes" on variables, but (so far as I can tell) the day is not yet.

Is there a way I get get a bunch of document nodes via ixsl:schedule-action (or some other means), process all of them, and use the result of the processing in a single result document?

Thanks! Graydon


Replies (3)

Please register to reply

RE: Saxon-JS and merging multiple documents from http URIs - Added by Martin Honnen about 1 year ago

Note that SaxonJS has its own forums at https://saxonica.plan.io/projects/saxon-js/boards.

I thought the call-template makes one call for all returned documents if document specifies more than one but somehow I get rather weird results that make it hard to tell what is supposed to happen or how to handle the responses; here is a sample trying to pull two documents:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:ixsl="http://saxonica.com/ns/interactiveXSLT"
  exclude-result-prefixes="#all"
  expand-text="yes">
  
  <xsl:param name="doc-uris" as="xs:string*"
    select="'https://raw.githubusercontent.com/martin-honnen/martin-honnen.github.io/master/xslt/2023/collection-file1.xml',
            'https://raw.githubusercontent.com/martin-honnen/martin-honnen.github.io/master/xslt/2023/collection-file2.xml'"/>

  <xsl:variable name="request" as="xs:string" select="$doc-uris => string-join(' ')"/>

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:output indent="yes"/>

  <xsl:template match="/" name="xsl:initial-template">
    <root>
       <xsl:message>Requesting {$request}</xsl:message>
	 <xsl:variable name="requests" as="item()*">
          <ixsl:schedule-action document="{$request}">
            <xsl:call-template name="doc-processing-action">
              <xsl:with-param name="docs" select="$doc-uris"/>
            </xsl:call-template>
          </ixsl:schedule-action>
       </xsl:variable>
       <xsl:if test="exists($requests)">
         <requested-documents>{$doc-uris}</requested-documents>
       </xsl:if>
    </root>
    <xsl:comment>Run with {system-property('xsl:product-name')} {system-property('xsl:product-version')} {system-property('Q{http://saxon.sf.net/}platform')}</xsl:comment>
  </xsl:template>
  
  <xsl:template name="doc-processing-action">
    <xsl:param name="docs"/>
    <xsl:message>called template doc-processing-action</xsl:message>
    <xsl:for-each select="$docs[doc-available(.)]!doc(.)">
      <xsl:message>Processing {base-uri()}</xsl:message>
    </xsl:for-each>
  </xsl:template>
  
</xsl:stylesheet>

But then I run this (Windows 11, Node 16) with e.g. xslt3 -it -xsl:sheet.xsl I get a debugging output the template is called twice but one of the two documents is returned twice:

Requesting https://raw.githubusercontent.com/martin-honnen/martin-honnen.github.io/master/xslt/2023/collection-file1.xml https://raw.githubusercontent.com/martin-honnen/martin-honnen.github.io/master/xslt/2023/collection-file2.xml
<?xml version="1.0" encoding="UTF-8"?>
<root/>
<!--Run with SaxonJS 2.5 Node.js-->called template doc-processing-action
Processing https://raw.githubusercontent.com/martin-honnen/martin-honnen.github.io/master/xslt/2023/collection-file1.xml
called template doc-processing-action
Processing https://raw.githubusercontent.com/martin-honnen/martin-honnen.github.io/master/xslt/2023/collection-file1.xml
Processing https://raw.githubusercontent.com/martin-honnen/martin-honnen.github.io/master/xslt/2023/collection-file2.xml

I don't know whether that is a quirk/bug or the expected result.

RE: Saxon-JS and merging multiple documents from http URIs - Added by Debbie Lockett about 1 year ago

I've been experimenting with this, and unfortunately I have come across a number of things which don't work in SaxonJS 2. Some things that can be done with SaxonJS 2 in the browser are not available on Node.js (e.g. because of the different ways we do requests with XMLHttpRequest in the browser, and with axios on Node.js); and sometimes the SaxonJS 2 ixsl:schedule-action specification is clearly lacking (but note that we are working on a redesign for SaxonJS 3)...

But I think you should be able to do what you are after with something like the following:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:ixsl="http://saxonica.com/ns/interactiveXSLT"
  xmlns:xs="http://www.w3.org/2001/XMLSchema" 
  version="3.0" expand-text="yes" exclude-result-prefixes="#all">
  
  <xsl:variable name="doc1" select="'http://localhost:3123/test/data/test.xml'"/>
  <xsl:variable name="doc2" select="'http://localhost:3123/test/data/books.xml'"/>
  <xsl:variable name="doc-uris" select="($doc1, $doc2)" as="xs:string*"/>
  <xsl:variable name="docSSS" select="string-join($doc-uris, ' ')" as="xs:string"/>
  
  <xsl:template name="xsl:initial-template">
    <root>
      <p>Fetching documents from <xsl:value-of select="$docSSS"/></p>
      <ixsl:schedule-action document="{$docSSS}">
        <xsl:call-template name="action">
          <xsl:with-param name="docs" select="$doc-uris"/>
        </xsl:call-template>
      </ixsl:schedule-action>
    </root>
  </xsl:template>
  
  <xsl:template name="action">
    <!-- This template is called once for each document fetch. 
      But we want to know when ALL documents have been fetched. 
      So check for this, and only do subsequent processing when all documents are available. -->
    <xsl:param name="docs"/>
    <xsl:variable name="docsAvailable" select="$docs ! doc-available(.)" as="xs:boolean*"/>
    <xsl:variable name="docsAllAvailable" select="not($docsAvailable = false())" as="xs:boolean"/>
    <xsl:if test="$docsAllAvailable">
      <xsl:result-document href="out.xml">
        <out>
          <xsl:for-each select="$docs">
            <p>Document fetched from <xsl:value-of select="."/></p>
            <div id="{.}">
              <xsl:sequence select="doc(.)"/>
            </div>
          </xsl:for-each>
        </out>
      </xsl:result-document>
    </xsl:if>
  </xsl:template>
  
</xsl:stylesheet>

This is not too different from what Martin attempted, but the key is that (as Martin notes) when using ixsl:schedule-action/@document to fetch multiple documents, the "action" call-template is actually called once for each document fetch (though frustratingly, there's no way to know which document fetch has triggered the template). So if you only want to do the subsequent processing once all documents are available, then you need to check for that and add a suitable conditional wrapper.

RE: Saxon-JS and merging multiple documents from http URIs - Added by Graydon Saunders about 1 year ago

Thank you, Debbie! I am impressed.

(Also, thank you, Martin; I will be careful to put the Saxon-JS questions in the right place in future.)

    (1-3/3)

    Please register to reply