Project

Profile

Help

Feature #4464

Calling 3rd-party API using ixsl:schedule-action/@http-request

Added by Mark Dunn 6 months ago. Updated 4 months ago.

Status:
AwaitingInfo
Priority:
Low
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
2020-02-25
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:

Description

The task I have is to update bibliographic references in an XML document with identifiers sourced from third party databases (e.g. CrossRef DOIs).

I thought Saxon-JS could be used to achieve this using just XSLT, by

  • building an HTTP request map from a bib reference
  • using ixsl:schedule-action/@http-request, parsing the response,
  • incorporating the result back into the bib reference.

But when I attempt this, I get an error in the browser "Cross-Origin Request Blocked". Because the HTTP request feature is only available in Saxon-JS, I have to run the code (attached) in a browser, which spots that the server running the code is trying to scrape data from another server, and blocks this action.

So my questions are:

  • Have I missed a parameter of some kind that would make this work?
  • Is it feasible to include ixsl:schedule-action and HTTP requests in Saxon-EE? (so I'm not tied to using a browser for this)
  • Or is the COR security issue a fundamental obstacle to making this work?
reference-poller.zip (18.4 KB) reference-poller.zip Prototype using Saxon-JS to call a 3rd-party API Mark Dunn, 2020-02-25 09:57

History

#1 Updated by O'Neil Delpratt 6 months ago

  • Status changed from New to AwaitingInfo

Hi,

Due to cross-site scripting restrictions the ixsl:schedule-action/@http-request feature will not allow you to access third-party sites to retrieve XML.

Implementing ixsl:schedule-action and HTTP requests in Saxon-EE is not something we need as the doc function can be used for simple URLs.

Possible workarounds in the browser:

  1. Configuring your server to allow the access of third-party sites (XSS), but this is not advised due to security issues.

  2. Proxy the third-party sites access via your server. You could in fact use Saxon on the server-side with some server like Servlex to process the HTTP request and use the doc function with simple URLs to retrieve the XML document which then can be sent back to the client using Saxon-JS ixsl:schedule-action/@http-request feature.

  3. If the third-party sites allows you to return the data as JSON instead of XML with Saxon-JS you could use the function json-doc from the client-side stylesheet. JSON is exempt from the XSS rules. You can convert the JSON to a map in the XSLT.

#2 Updated by Mark Dunn 6 months ago

Thanks O'Neil, that's helpful.

I'm not going to try and get around the security issues!

The doc() function will work with the CrossRef API, but other APIs work differently.

For example, Web of Science requires authentication via a HTTP header 'Authorization', which returns a session ID.

Then, a WoS API request requires an HTTP header 'Cookie' with the session ID.

http://wokinfo.com/media/pdf/WebServicesLiteguide.pdf

So I think we need more than the doc() function to access the WoS API.

EXPath looks like it has a feature for this - http://expath.org/modules/http-client/ - but it looks like this is not implemented in Saxon-EE

#3 Updated by Michael Kay 6 months ago

Also relevant is CORS, see https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS which ought to work with Saxon-JS but I don't think we've actually tried it.

#4 Updated by Michael Kay 6 months ago

As regards the question,

Is it feasible to include ixsl:schedule-action and HTTP requests in Saxon-EE? (so I'm not tied to using a browser for this)

The answer is yes, it's feasible, but it's significant effort (mainly testing effort). I think we would initially do synchronous HTTP requests only.

#5 Updated by Mark Dunn 6 months ago

Thanks Michael - if it helps, it's the HTTP request (with headers) that I'm interested in, not the asynchronous-ness (-nosity?)

#6 Updated by Mark Dunn 6 months ago

Had a quick go at submitting a "preflight" request:

<xsl:variable name="http-preflight-headers" as="map(*)">
  <xsl:map>
    <xsl:map-entry key="'Access-Control-Request-Method'" select="'GET'"/>
    <xsl:map-entry key="'Access-Control-Request-Headers'" select="'Content-Type'"/>
    <xsl:map-entry key="'Origin'" select="'http://127.0.0.1'"/>
  </xsl:map>
</xsl:variable>

<xsl:variable name="http-preflight" as="map(*)">
  <xsl:map>
    <xsl:map-entry key="'method'" select="'OPTIONS'"/>
    <xsl:map-entry key="'href'" select="'https://doi.crossref.org'"/>
    <xsl:map-entry key="'headers'" select="$http-preflight-headers"/>
  </xsl:map>
</xsl:variable>
                
<ixsl:schedule-action http-request="$http-preflight">
  <xsl:call-template name="handle-preflight-response"/>
</ixsl:schedule-action>

<xsl:template name="handle-preflight-response">
        <xsl:context-item as="map(*)" use="required"/>
        <xsl:message>HTTP response status: <xsl:sequence select="serialize(?status)"/></xsl:message>
        <xsl:message>HTTP response headers: <xsl:sequence select="serialize(?headers)"/></xsl:message>
    </xsl:template>

Got some new error messages, reported by Saxon-JS:

Attempt to set a forbidden header was denied: Origin Attempt to set a forbidden header was denied: Access-Control-Request-Headers Attempt to set a forbidden header was denied: Access-Control-Request-Method

#7 Updated by David Priest 4 months ago

From the research I've been doing re: the same issue, there's no need to do the preflight: your browser will do it automatically when you submit a "complex" request. See: https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS

Please register to edit this issue

Also available in: Atom PDF