Bug #4852
closedSaxonJS incorrectly URI encodes 'value' attributes on 'input' elements
100%
Description
I have a SaxonJS stylesheet running in Node that includes the following HTML output:
<input placeholder="search" name="q" value=""
size="40"/> <input value="🔍" type="submit"/></p>
That appears in the principleResult
output as:
<input placeholder="search" name="q" value="" size="40"> <input value="%F0%9F%94%8D" type="submit"></p>
Updated by Michael Kay almost 4 years ago
input/@value
is a URI attribute according to https://www.w3.org/TR/xslt-xquery-serialization-31/#list-of-uri-attributes, and this appears to be the correct %HH encoding of this character.
Saxon/J doesn't include input/@value in the list of URI attributes and doesn't %HH-encode it, but if you put input src="🔍"/>
through the HTML serializer then it comes out as <input src="%F0%9F%94%8D"/>
So I guess the question is, why do Saxon/J and Saxon-JS differ in the way that the HTML serializer handles URI-escaping of this attribute? Given the decision to do URI escaping, it seems to be doing it correctly as far as I can see.
Updated by Michael Kay almost 4 years ago
According to the DTD at https://www.w3.org/TR/html401/sgml/dtd.html input/@value
is a CDATA attribute. So perhaps its inclusion in the list at https://www.w3.org/TR/xslt-xquery-serialization-31/#list-of-uri-attributes is an error that we quietly fixed in the Java product?
Updated by Michael Kay almost 4 years ago
input/@value
is not included in the list of URI attributes in the 1.0/2.0 Serialization spec at https://www.w3.org/TR/2007/REC-xslt-xquery-serialization-20070123/#list-of-uri-attributes
Very strange. Henry wouldn't have added it to the list without a good reason.
The only other differences between the two versions of the spec are that 3.0 has added input/@formaction
, button/@formaction
, and video/@poster
. These aren't present in the list used by Saxon/J (see HTMLURIEscaper.java
), so it looks as if Saxon/J never implemented any of these changes in the 3.0 spec.
Updated by Michael Kay almost 4 years ago
The change to the list of URI-valued attributes first appeared in the draft of 2013-01-08 (https://www.w3.org/TR/2013/WD-xslt-xquery-serialization-30-20130108/ ) and the change log attributes it to bug 6129, but this bug was a catch-all to extend the spec to enable HTML5 support, and there is nothing specific in the bug about changing the list of URI attributes. Perhaps (pure conjecture here) Henry was looking at a draft HTML5 spec that subsequently changed?
Updated by Norm Tovey-Walsh almost 4 years ago
- Subject changed from Unicode characters above the BMP don't pass through SaxonJS/Node correctly? to SaxonJS incorrectly URI encodes 'value' attributes on 'input' elements
Updated by Community Admin almost 4 years ago
- Applies to JS Branch 2 added
- Applies to JS Branch deleted (
2.0)
Updated by Debbie Lockett over 3 years ago
- Status changed from Resolved to Closed
- % Done changed from 0 to 100
- Fixed in JS Release set to Saxon-JS 2.1
Bug fix applied in the Saxon-JS 2.1 maintenance release.
Please register to edit this issue
Also available in: Atom PDF Tracking page