Project

Profile

Help

Feature #5021

closed

Option to supply base URI of the source text in SaxonJS.getResource() for Node.js

Added by Yury Palyanitsa over 3 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Low
Category:
-
Sprint/Milestone:
-
Start date:
2021-06-11
Due date:
% Done:

100%

Estimated time:
Applies to JS Branch:
2
Fix Committed on JS Branch:
2
Fixed in JS Release:
SEF Generated with:
Platforms:
Company:
-
Contact person:
-
Additional contact persons:
-

Description

I am having a case, when I would like to use the base-uri() function for a resource I resolved with SaxonJS.getResource({text: <source_text>, type: "xml"}). The reason for that is I need to resolve the links that are relative to the document I resolved using its text, while embedding it into the document I use in SaxonJS.transform().

I was trying the following approaches in my attempt to make it work:

  1. Use text and location options together (like it is done in SaxonJS.transform()) then use a stylesheet parameter to access the resolved document — the text option is ignored and the document is resolved using the location option.
  2. Use only text option and supply the {"<source_uri>": "<resource_object>"} in documentPool then use doc($<source_uri>) to access the document — the document is still fetched by its URI instead of provided resource.

I noticed that when I am using the "location" option, the object is provided with _saxonBaseUri and _saxonDocUri options that contain the URI of the resolved document. I tried providing _saxonBaseUri and _saxonDocUri directly in the resource I got, and it actually worked!

So, is there a proper way to do what I described in case I am doing it wrong? And if not, then is it possible to provide such option to align the behavior with SaxonJS.transform()?

Actions #1

Updated by Yury Palyanitsa over 3 years ago

Please change the issue type from "Bug" to "Feature". Did this by accident.

Actions #2

Updated by Debbie Lockett over 3 years ago

  • Tracker changed from Bug to Feature
Actions #3

Updated by Debbie Lockett over 3 years ago

Thanks for the suggestion. I'm actually looking at SaxonJS.getResource at the moment (related to #5017 and #4748). I think you are right that there are currently some gaps in the API.

Could you share a sample repro - i.e. XML source(s) and XSLT stylesheet? That would be helpful for me to reproduce the issues you describe, to do with using the SaxonJS API.

In point 1, you say you tried to "Use text and location options together (like it is done in SaxonJS.transform())". Could you expand what you mean by "like it is done in SaxonJS.transform()"? Do you mean using the sourceText and sourceLocation options together? It's a surprise to me if that works!

Actions #4

Updated by Yury Palyanitsa over 3 years ago

Hello Debbie,

Well, then it's surprising for me too if that is not actually supposed to work :) But it indeed works this way and I demonstrated it in the sample repro.

Here's the github repo with the samples that reproduce the environment in the issue: https://github.com/deiteris/saxonica-issue-5021

The structure:

  • The MasterPages/ folder includes MasterPage.htm that is used as primary source in SaxonJS.transform(). It also includes Stylesheet.css in the same folder to demonstrate how the link will resolve for it.
  • The Documents/ folder includes Document.htm that is resolved using SaxonJS.getResource(). It also includes a stylesheet which relative link points to ../Styles/DocumentStylesheet.css
  • main.js is used to start the transformation and will output the base-uri() of the link//@href selector, and the resulting output. Notice two TODO comments in it that outline what I said.

Instructions:

  • Clone the project, go to project folder and run npm install
  • Run node main.js
Actions #5

Updated by Yury Palyanitsa over 3 years ago

Update: I was not correct about using sourceText and sourceLocation in SaxonJS.transform(). When both supplied - sourceLocation is actually used.

Actions #6

Updated by Yury Palyanitsa over 3 years ago

Sorry for the spam as I see no way to edit my messages, but I actually triple-checked the behavior of SaxonJS.transform(). And yes, it actually works with both options supplied!

How I checked this:

I am developing a plugin for live previewing of transformed XML documents. When the editor's content changes - it is not stored on the disk immediately, but I am able to get the contents of the editor directly. When I supply only sourceLocation, Saxon fetches the document contents from the disk and my preview doesn't update as I type, because the file is not updated on the disk yet. BUT, if I additionally provide the contents of the editor in sourceText, the preview updates as I type, and the base-uri() works at the same time.

Actions #7

Updated by Debbie Lockett over 3 years ago

  • Status changed from New to In Progress

Thanks for the repro, etc.

It looks like the SaxonJS.transform() behaviour you have discovered of using sourceText with sourceLocation to provide the base URI is possibly an unintentional quirk in the code. It looks like it only works this way for an asynchronous transform (and not for a synchronous transform), and I don't think it was really by design. I think the intention is that sourceLocation, sourceFileName, sourceNode, and sourceText are mutually exclusive (though we no longer check that only one is used). Perhaps we should be adding a sourceBaseUri option which can be used with sourceText (and possibly sourceNode) to allow the base URI for this source to be supplied; rather like we have the stylesheetBaseURI option which can be used with stylesheetText or stylesheetInternal to supply the static base URI of the stylesheet.

Actions #8

Updated by Debbie Lockett over 3 years ago

Thanks again for your repro, I have been using it to run a number of tests trying out different combinations of the options for supplying the stylesheet, primary and secondary sources, to check how the base URIs for each of these is handled. As well as the need to add options to supply the base URI of sources loaded from text for both SaxonJS.getResource and SaxonJS.transform, the testing has also shown that there is some other tidying up to do in this area. (i.e. We could better align the initial processing while loading sources and stylesheets for asynchronous and synchronous transforms).

By the way, in your initial report, in point 2 you said that you had not managed to use the documentPool for the preloaded secondary source; but this should work. I wonder if there was an issue in the way that you created the documentPool? Having loaded the resource with SaxonJS.getResource({text: <source_text>, type: "xml"}), and manually set its base URI with _saxonBaseUri and _saxonDocUri, the following should work:

    const documentPool = {};
    documentPool[<source_uri>] = <resource_object>;

This documentPool can then be supplied in the SaxonJS.transform, and you can use doc($<source_uri>) to access the preloaded resource from the stylesheet. Obviously you have found a good alternative with supplying the resource in a stylesheet parameter, but you might like to try again with the documentPool, as this is what it was designed for. Let me know if you still have issues using this.

Actions #9

Updated by Yury Palyanitsa over 3 years ago

By the way, in your initial report, in point 2 you said that you had not managed to use the documentPool for the preloaded secondary source; but this should work. I wonder if there was an issue in the way that you created the documentPool?

Document pool works too and I do can access the preloaded resource using doc($<source_uri>) when I use documentPool, my point 2 is actually misleading here. What I meant is that documentPool doesn't help with empty base-uri() issue for the preloaded text resource, even though I provide a document URI with the resource.

Obviously you have found a good alternative with supplying the resource in a stylesheet parameter, but you might like to try again with the documentPool, as this is what it was designed for. Let me know if you still have issues using this.

In this specific case, I work with only 2 files so I don't see much benefit from using documentPool, but in mass transform scenario I'll definitely come back to it. It's demonstrated the same way on the documentation page https://www.saxonica.com/saxon-js/documentation/index.html#!api/getResource

Actions #10

Updated by Debbie Lockett over 3 years ago

A number of closely related but separate issues have been raised and discovered while investigating this feature request, which all need to be addressed:

  1. Add option to set base URI for source loaded with SaxonJS.getResource from text.
  2. Add option to set base URI for SaxonJS.transform source supplied by sourceText.
  3. Source base URI is not set correctly from sourceFileName for async SaxonJS.transform.
  4. SaxonJS.transform options sourceLocation and stylesheetLocation do not always handle relative URIs correctly.
  5. SaxonJS.getResource option location does not handle relative URIs.

Further notes on the current status (for the latest release Saxon-JS 2.2):

Issue 1: The work around is to set the _saxonBaseUri property manually, but an option should be added to the API.

Issue 2: Currently, for an asynchronous transform, there is a work around to supply the source base URI using sourceLocation; but this is not really how the API is designed to work. The transform options for supplying the source (and stylesheet) should be mutually exclusive, and we should add a check for this in the code, to avoid confusion about which options are being used.

Issue 3: There is a bug in the code for asynchronous transforms; sourceFileName (absolute or relative) is not resolved correctly before being used to set the base URI.

Issue 4: There are 2 bugs in the code. For asynchronous transforms, at the point that they are used to obtain the resources, locations (sourceLocation and stylesheetLocation) are assumed to be absolute. These values have earlier been normalised to absolute URIs if working in the browser, but not on Node.js. For synchronous transforms, on Node.js there is a bug in resolving against the current working directory (caused by a missing trailing slash).

Issue 5: Currently the SaxonJS.getResource option location is assumed to be absolute. Meanwhile the SaxonJS.transform location options handle relative URIs - resolved against the current working directory on Node.js, and against the location of the HTML page in the browser. It seems reasonable to align the SaxonJS.getResource location option to similarly handle relative URIs.

A set of Node.js tests covering these issues has been added (see src/test/nodejs/iss5021_test.js).

Actions #11

Updated by Debbie Lockett over 3 years ago

  • Status changed from In Progress to Resolved
  • Assignee set to Debbie Lockett
  • Applies to JS Branch 2 added
  • Fix Committed on JS Branch 2 added

Code fixes and documentation updates committed.

New options added to set base URIs for primary and secondary sources:

  1. baseURI for SaxonJS.getResource for use with text;
  2. sourceBaseURI for SaxonJS.transform for use with sourceText or sourceNode.

The other bugs noted above have also been fixed.

Actions #12

Updated by Debbie Lockett over 3 years ago

  • % Done changed from 0 to 100
  • Fixed in JS Release set to Saxon-JS 2.3

Bug fix applied in the Saxon-JS 2.3 maintenance release.

Actions #13

Updated by Debbie Lockett over 3 years ago

  • Status changed from Resolved to Closed

Please register to edit this issue

Also available in: Atom PDF Tracking page