Note that the error message (Character decimal 13 is not available in the chosen encoding) is produced by Serialize.js encode() method when charRefsAllowed=false
. This happens when outputting names, comments, CDATA sections, etc and when disable-output-escaping is set; that is, when converting CR to
is not an option. I don't know why this is happening, but the error message appears wrong: in such situations codepoint 13 should probably be output as itself (the statement that the character is not available in the chosen encoding is factually incorrect).
Looking at the books.xsl stylesheet, it does
<xsl:comment><xsl:copy-of select="unparsed-text('books.txt')"/></xsl:comment>
and I suspect this is where the error is coming from. If GitHub has changed books.txt to contain CRLF line endings, this will not be subject to line ending normalization because there is no XML parsing, so we're outputting a comment containing a CR character, and I think this should be output as an unescaped CR.
Looking at the W3C spec (Serialization 3.1) the relevant rule in the lede of §5 is
A consequence of this rule[§] is that certain characters MUST be output as character references, to ensure that they survive the round trip through serialization and parsing. Specifically, CR, NEL and LINE SEPARATOR characters in text nodes MUST be output respectively as "
", "
", and "
", or their equivalents; while CR, NL, TAB, NEL and LINE SEPARATOR characters in attribute nodes MUST be output respectively as "
", "
", " ", "
", and "
", or their equivalents. In addition, the non-whitespace control characters #x1 through #x1F and #x7F through #x9F in text nodes and attribute nodes MUST be output as character references.
§ "This rule" means the round-tripping rule, ie. the rule that serialization followed by parsing must leave the document unchanged.
So it doesn't seem to say explicitly how a CR character in a comment should be handled. Arguably §5.1.3 applies:
When outputting any other character that is defined in the selected encoding, the character MUST be output using the correct representation of that character in the selected encoding.
But this conflicts with the round-tripping rule.
So I think the spec leaves us a choice between two actions, both of which violate the spec: either output the CR "as is", or raise an error. But the current error message is misleading.