Support #3065
closedCharacter escaping with serialize()
0%
Description
For a project that I'm working on I'm using serialize to render some html as a string for inclusion in the meta/ content attribute value of an html document. A singularly large search company has imposed a very specific character escaping requirement for single quotes (as @ & # 3 9 ; ) and double quotes (as
& quot ; ). I've tried playing around with various values of output:use-character-maps and saxon:character-representation but none have resulted in the dictated output. I always end up with unescaped single quotes and
& # 3 4 ; @ for double quotes. I've attached a toy stylesheet to demonstrate what I'm attempting.
Is this at all possible with the capabilities available in Saxon?
Files
Updated by Nick Nunes almost 8 years ago
For a project that I'm working on I'm using serialize to render some html as a string for inclusion in the meta/@content value of an html document. A singularly large search company has imposed a very specific character escaping requirement for single quotes (as ') and double quotes (as "). I've tried playing around with various values of output:use-character-maps and saxon:character-representation but none have resulted in the dictated output. I always end up with unescaped single quotes and " for double quotes. I've attached a toy stylesheet to demonstrate what I'm attempting.
Is this at all possible with the capabilities available in Saxon?
Updated by Nick Nunes almost 8 years ago
Sigh, apologies for the duplication. The "Edit" button is apparently "Add Comment". The original description the escaping was unescaped. The comment reflects the escaping as required.
Updated by Michael Kay almost 8 years ago
- Status changed from New to In Progress
Let's see if I can get the escaping right in this response.
You have written map-string="& q u o t ;"
I think you need to write map-string="& a m p ; q u o t ;"
That's because you want the string used to represent a quotation mark to be the six-character string '& q u o t ;" which is represented in an XSLT stylesheet using the 10-character string "& a m p ; q u o t ;" The way you have written it, the replacement string is a single quotation-mark character.
Updated by Nick Nunes almost 8 years ago
I tried "& a m p; q u o t ;" but that just ended up as "& a m p; q u o t ;" in the output.
Updated by Michael Kay almost 8 years ago
Oh right, I failed to spot that you were actually serializing this twice, first with character maps and then without. If the first serialization phase puts an "&" into the output, then the second serialization phase will output this as "& a m p ;".
I don't think there is anything you can input to the second serialization phase that will force it to produce "& q u o t ;" in the output, without further use of character maps. Presumably you are doing it this way, rather than using xsl:character-map, because you don't want the character map to apply globally. But I think you will need a character map for the final serialization, whether or not you use one initially.
Perhaps your output:serialization-parameters should map x34 to something like §§§§ that is not going to appear anywhere else (you can use Private Use characters if you want), and then the final xsl:output should use a character map that maps §§§§ to "& a m p ; q u o t ;".
Updated by Nick Nunes almost 8 years ago
Thanks, I failed to realize I was serializing this twice myself. This solution worked. Thank you for your help.
Updated by Michael Kay almost 8 years ago
- Status changed from In Progress to Resolved
Please register to edit this issue