Project

Profile

Help

GET requests in doc() against an API

Added by Graydon Saunders about 1 year ago

Hello --

I'm not sure if this is general; I'm fairly sure it isn't a bug.

Using Saxon-JS 2.5, and using a stylesheet which contains only (with $someURI replaced by a literal value)

<xsl:template name="xsl:initial-template">
    <xsl:sequence select="doc($someURI)"></xsl:sequence>
  </xsl:template>

returns

Cannot read file {value in $someURI} - Synchronous access to non-file resources is not allowed

I'm tested that the URI is correctly structured, that it represents a document actually in the DB the API is sitting in front of, and that a curl call using that URI from the same machine and same user that the Saxon-JS is being run on gets the expected document. So the supposition is that doc() does not cope well with being asked to retrieve a URI from a fundamentally asynchronous URI.

But I'm not that confident that's what the error message really means, and I'm hopeful there's a way to do this.

If I'm trying to retrieve an XML document from an API via a specific-to-the-document URI, how should I do that?

thanks! Graydon


Replies (11)

Please register to reply

RE: GET requests in doc() against an API - Added by Martin Honnen about 1 year ago

What is the environment, Node.js or the browser?

Do you run SaxonJS from your own JavaScript code so that you could use SaxonJS.getResource before running the XSLT transformation?

See https://www.saxonica.com/saxon-js/documentation2/index.html#!api/getResource

RE: GET requests in doc() against an API - Added by Graydon Saunders about 1 year ago

The environment is complex; an electron app invokes a distinct module (= it has its own docker container) to do specific processing, which includes invoking the XSLT stylesheet. I think by the point it's invoking Saxon JS it's Node.js doing the invoking.

The document passed as the source document to the transformation contains metadata from which it is possible to construct the URI used to identify the document I want to fetch, since it has information the source document doesn't that's also needed to produce a correct result.

It is not my JavaScript code; I don't have control over anything except the stylesheet. I would guess that it's possible to run SaxonJS.getResource but that would be a question for the maintainer of module code. It's not obvious there's any way to identify the second input document URI in those processes except by parsing the source document.

RE: GET requests in doc() against an API - Added by Martin Honnen about 1 year ago

Wait for Norm or Debbie from Saxonica to tell you whether they have some idea or whether it is possible at all.

RE: GET requests in doc() against an API - Added by Graydon Saunders about 1 year ago

Waiting does seem prudent.

Thank you!

GET requests in doc() against an API - Added by Norm Tovey-Walsh about 1 year ago

I'm not sure if this is general; I'm fairly sure it isn't a bug.

I’m afraid not.

Using Saxon-JS 2.5, and using a stylesheet which contains only (with
$someURI replaced by a literal value)

<xsl:template name="xsl:initial-template">
<xsl:sequence select="doc($someURI)"></xsl:sequence>
</xsl:template>

returns

Cannot read file {value in $someURI} - Synchronous access to non-file
resources is not allowed

This is a consequence of how threads work in Node.js. Basically, Node.js
is single threaded, across all applications running on the server.
Consequently, doing a blocking (synchronous) read in any one app would
block all apps on that server. To avoid this, Node.js doesn’t provide
any APIs for doing a blocking read.

If I'm trying to retrieve an XML document from an API via a
specific-to-the-document URI, how should I do that?

You can preload the document before you call SaxonJS, but I think that
was discussed later in this thread and isn’t convenient. The other
option is to use xsl:schedule-action. That will allow you to do a
non-blocking (asynchronous) read operation. Execution will immediately
continue in some other thread on the server and will resume in your
stylesheet in the template identified by the xsl:schedule-action when
the read has completed.

Hope that helps!

Be seeing you,
norm

--
Norm Tovey-Walsh
Saxonica

RE: GET requests in doc() against an API - Added by Graydon Saunders about 1 year ago

This is tremendously helpful, in the sense that it does retrieve the document. Thank you!

I am tangled up in how I might get the source document in such a way that it is available for further processing. (What I want to do is to import this document to provide ancillary information for creating the result document in a transform with a different source document.)

Note that ixsl:schedule-action does not write the value returned by the contained xsl:call-template instruction to the current result tree. In practice therefore, the only useful thing that the called template can do is to issue an xsl:result-document instruction (or similar instructions with side-effects, such as ixsl:set-attribute).

To prevent the returned object from being written to the current result tree, it may be necessary to throw it away, for instance:

<xsl:variable name="request" as="item()*">
   <ixsl:schedule-action document="{$uri}">
      <xsl:call-template name="action"/>
   </ixsl:schedule-action>
</xsl:variable>
<xsl:sequence select="$request[current-date() lt xs:date('2000-01-01')]"/>

seems to imply there's a way, but I'm not at all certain how to do it. I would really like to know what's in the action template.

Much appreciated!

RE: GET requests in doc() against an API - Added by Graydon Saunders about 1 year ago

And of course the action template is right there if I relate it an example.

Looks like this is working; hopefully having grown a brain cell will last.

Thank you!

RE: GET requests in doc() against an API - Added by Graydon Saunders about 1 year ago

So if I use:

<xsl:template name="action">
    <xsl:param name="peURI"/>
    <xsl:choose>
      <xsl:when test="doc-available($peURI)">
        <xsl:result-document href="#target" method="ixsl:append-content">
          <xsl:sequence select="doc($peURI)"/>
        </xsl:result-document>
      </xsl:when>
      <xsl:otherwise>
        <FAIL>
          <xsl:result-document href="#target" method="ixsl:replace-content">
            <p>Document not available: <xsl:value-of select="$peURI"/>.</p>
          </xsl:result-document>
          <xsl:try select="doc($peURI)">
            <xsl:catch>
              <xsl:result-document href="#target" method="ixsl:append-content">
                <xsl:choose>
                  <xsl:when test="$err:code eq QName('http://www.w3.org/2005/xqt-errors', 'SXJS0008')">
                    <p>Document fetch timeout.</p>
                  </xsl:when>
                  <xsl:otherwise>
                    <p>Document fetch error: <xsl:value-of select="$err:code"/></p>
                  </xsl:otherwise>
                </xsl:choose>
                <p>
                  <xsl:value-of select="$err:description"/>
                </p>
              </xsl:result-document>
            </xsl:catch>
          </xsl:try>
        </FAIL>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

and

  <xsl:variable as="item()?" name="extracted" use-when="not($usingJava)">
    <xsl:variable name="temp">
      <ixsl:schedule-action document="{$callManagedFolders}">
        <xsl:call-template name="action">
          <xsl:with-param name="peURI" select="$callManagedFolders"/>
        </xsl:call-template>
      </ixsl:schedule-action>
    </xsl:variable>
    <!-- this is supposed to prevent obliterating the result tree with the retrieved document -->
    <xsl:sequence select="$temp[current-date() lt xs:date('2000-01-01')]"/>
  </xsl:variable>

I have two problems.

The first is that when a fetched document is found by the action template, the result tree consists of the fetched document. It doesn't seem to matter if I use ixsl:append-content or ixsl:replace-content. Am I applying the "throw it away" step incorrectly?

The second is that my goal is to be able to extract information from the fetched document and stuff it into a variable for reference elsewhere, and I'm getting the impression that this is not an achievable goal. The fetched document can be processed inside the result-document but it's just flat impossible to selectively extract information from the fetched document into the result document produced by processing the source document. Is that correct?

RE: GET requests in doc() against an API - Added by Norm Tovey-Walsh about 1 year ago

Saxonica Developer Community writes:

The first is that when a fetched document is found by the action
template, the result tree consists of the fetched document. It doesn't
seem to matter if I use ixsl:append-content or ixsl:replace-content.
Am I applying the "throw it away" step incorrectly?

It’s a little hard to see what might be wrong from just those parts.
Here’s a simple example. I have this JS file, msg9501.js:

const SaxonJS = require('saxon-js');
const stylesheet = require('./main.sef.json');

const srcxml = "

Some text.

";
const master = SaxonJS.XPath.evaluate(
'parse-xml($srcxml)', [],
{ 'params' : { 'srcxml' : srcxml } });

const options = {
"stylesheetInternal": stylesheet,
"destination": "serialized",
"sourceNode": master,
"masterDocument": master
};

SaxonJS.transform(options, "async")
.then(output => {
console.log(SaxonJS.serialize(output.masterDocument));
}).catch(error => {
console.log(error);
});

And this stylesheet, main.xsl:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:ixsl="http://saxonica.com/ns/interactiveXSLT"
exclude-result-prefixes="#all"
expand-text="yes"
version="3.0">

<xsl:template match="doc">
<ixsl:schedule-action document="input.xml">
<xsl:call-template name="action">
<xsl:with-param name="peURI" select="'input.xml'"/>
</xsl:call-template>
</ixsl:schedule-action>
</xsl:template>

<xsl:template name="action">
<xsl:param name="peURI"/>
xsl:choose
<xsl:when test="doc-available($peURI)">
<xsl:result-document href="#target" method="ixsl:append-content">
<xsl:sequence select="doc($peURI)/doc/*"/>
</xsl:result-document>
</xsl:when>
xsl:otherwise
<xsl:result-document href="#target" method="ixsl:replace-content">

Document not available: .

My input, input.xml is:

This is some test input.

If I compile main.xsl and run “node msg9501.js”, I get:

Some text.

This is some test input.

Showing that the input XML has been appended to the target.

The second is that my goal is to be able to extract information from
the fetched document and stuff it into a variable for reference
elsewhere, and I'm getting the impression that this is not an
achievable goal.

Not directly. You can’t set a global variable from inside a called
template.

The fetched document can be processed inside the
result-document but it's just flat impossible to selectively extract
information from the fetched document into the result document
produced by processing the source document. Is that correct?

It’s unlikely to be impossible, but the details will depend precisely on
what you’re trying to do. The first two thoughts I have are to run two
transformations where the first updates the masterDocument and the
second processes the masterDocument output of the first in some way.

If you need or want all of the processing to be done in a single
transformation, I think you can achieve much the same thing by chaining
together the scheduled actions (if you need to retrieve more than one
document) and then updating the master document as you wish in the last
link in the chain.

Hope that helps!

Be seeing you,
norm

--
Norm Tovey-Walsh
Saxonica

RE: GET requests in doc() against an API - Added by Graydon Saunders about 1 year ago

Thank you for the example!

It now appears likely that there will be some process changes upstream of me, which will remove this requirement.

So I think this one can be considered closed.

Much appreciated!

-- Graydon

    (1-11/11)

    Please register to reply