Project

Profile

Help

Support #3230

closed

Blessed way to disable json escaping?

Added by Nick Nunes almost 7 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
Low
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
2017-05-19
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

Is there any way that json escaping can be disabled while using xml-to-json() or the json output method similar to the serialze-json() function with 'escape':true()? From what I can tell, the spec does not offer this option. Although I could use serialize-json() this line from the Saxon documentation gives me pause to do so:

This function remains available in the Saxon implementation for the time being."

I've attached two stylesheets demonstrating the difference.


Files

json-unescaped.xsl (466 Bytes) json-unescaped.xsl Nick Nunes, 2017-05-19 23:33
json-escaped.xsl (428 Bytes) json-escaped.xsl Nick Nunes, 2017-05-19 23:33
serialize-json.serialize.xsl (982 Bytes) serialize-json.serialize.xsl Nick Nunes, 2017-05-23 21:32
Actions #1

Updated by Michael Kay almost 7 years ago

What are you actually trying to achieve?

You're right that the JSON output method doesn't give you any control over which characters get escaped. But why do you want such control?

Actions #2

Updated by Nick Nunes almost 7 years ago

We have implemented a service which, given a simple xml configuration containing some XPaths, can extract data from an XML file and return the data as JSON. This service is currently implemented using the Saxon 9.5 API directly. For a variety of reasons we're reimplementing this service using a meta-XSL approach. Rather than use the Saxon API to do the extraction, we'll "transpile" the configurations to stylesheets and run those (our intent was with Saxon 9.7).

The current output of this service is unescaped. In order to match that we'd like control over the escaping. Our mechanical consumers of the service could cope with the additional escaping, but a number of developers inspect the JSON output and use it for debugging purposes. One of the most common things we extract are URLs—the escaping breaks copy/paste which represents a quality of life downgrade for those developers.

Actions #3

Updated by Michael Kay almost 7 years ago

I guess there are two scenarios one might consider:

(a) avoid escaping in cases where the JSON would be perfectly valid without escaping. The spec says:

JSON escaping replaces the characters quotation mark, backspace, form-feed, newline, carriage return, tab, reverse solidus, or solidus by the corresponding JSON escape sequences ", \b, \f, \n, \r, \t, \, or / respectively, and any other codepoint in the range 1-31 or 127-159 by an escape in the form \uHHHH where HHHH is the hexadecimal representation of the codepoint value.

In all these cases except "/", the JSON would be invalid if we didn't escape the special character. So scenario (a) would be to avoid escaping except where it is necessary to produce valid JSON, which effectively means not escaping "/". For URIs, I would think that "/" is the main cause of problems.

(b) rely on the application to perform all necessary escaping, so if the input contains "\" then we output "\" rather than "\\". Obviously this then raises questions about what happens if the application fails to perform all necessary escaping.

I don't feel I understand your use case well enough to understand whether either of these options would fit the bill.

The other question is whether the requirements can be met using character maps. For example if there is a character map that maps "/" to "/" this will effectively prevent "/" being escaped as "/". As far as I can see, this could be used to prevent all escaping (including cases like " where the resulting output would be invalid).

Actions #4

Updated by Michael Kay almost 7 years ago

With the fixes to bugs 3229 and 3223, you will be able to disable escaping of the "/" character in JSON serialization by defining a character map with

   <xsl:output method="json" use-character-maps="no-escape-slash" build-tree="no"/>
   
   <xsl:character-map name="no-escape-slash">
      <xsl:output-character character="/" string="/"/>
   </xsl:character-map>
Actions #5

Updated by Nick Nunes almost 7 years ago

There's probably more going on in my case than I'm aware of. I don't know the exact techniques that are being employed to serialize the JSON in the current implementation of our service. There is the possibility that it is producing invalid JSON in some cases which we would not want to replicate. This technique will probably cover my needs. When might a point release with these fixes be available?

Also, although I did not expect it to work I tested the above output serialization parameters with serialize() and it did not error, but the character map did not unescape the slashes. I've attached an example. Will the above fixes apply to serailze() as well?

As an aside, I'm currently using the XML representation of JSON to construct the JSON payload before serialization. In order to utilize this technique, before output I will need to do parse-json(xml-to-json($payload)) correct? Is there a way to serialize to JSON directly from the xml representation that would respect the character map?

Actions #6

Updated by Michael Kay almost 7 years ago

Sorry to drop this.

Yes, character maps with JSON aren't working properly at the moment: I've cited the bug fixes that make them work once we produce the next maintenance release. I haven't specifically tested the changes with fn:serialize, but it's using the same code underneath so it should be OK.

Your supplementary question:

_As an aside, I'm currently using the XML representation of JSON to construct the JSON payload before serialization. In order to utilize this technique, before output I will need to do parse-json(xml-to-json($payload)) correct? Is there a way to serialize to JSON directly from the xml representation that would respect the character map?

_

Yes. The xml-to-json() function produces a string in JSON format with no control over the way the JSON is formatted other than the indent option. If you want to control the formatting of the JSON you'll have to convert it to the maps-and-arrays data stricture using parse-json(), and then serialize the result.

Actions #7

Updated by Michael Kay almost 7 years ago

  • Status changed from New to Closed
  • Assignee set to Michael Kay

I think this support request can now be closed.

Please register to edit this issue

Also available in: Atom PDF