generate-id() on attribute and namespace nodes may produce a non-ASCII string
The rules for generate-id() require that the returned ID consists entirely of ASCII alphanumerics. Saxon-JS does not conform to this when the node in question is an attribute or namespace; it copies the (local) name of the node into the generated ID.
The spec states:
The returned identifier must consist of ASCII alphanumeric characters and must start with an alphabetic character. Thus, the string is syntactically an XML name.
But of course, not every valid XML name consists entirely of ASCII alphanumeric characters.
Updated by Michael Kay almost 6 years ago
Note also that the form used for attributes doesn't necessarily generate a unique ID, because it only uses the local name of the attribute and not the namespace URI.
In addition, I think the algorithm for generate-id() fails for nodes that are not part of a tree rooted at a document node.
I would suggest: for documents, elements, comments, PIs, and text nodes having document node as ancestor, use the current algorithm.
For attributes, namespaces, and "non-document" nodes of other kinds, allocate a key by incrementing some global sequence number (held perhaps in the context), and store this key as a property (_saxon_generated_id) of the node. Note that when a node is copied, this property should be dropped.
Updated by Michael Kay about 4 years ago
- Description updated (diff)
- Status changed from New to Resolved
- Applies to JS Branch 0.9, 1.0, Trunk added
- Fix Committed on JS Branch Trunk added
Non-ascii characters in IDs for attributes and namespaces: I have fixed this by "asciifying" the node names (EQName in the case of attributes); this is done by replacing all characters not in [A-Za-z] with their numeric character code, with a leading zero. New test case expression-2102 demonstrates the bug and tests the fix. Fix applied to 2.0 only, but it could be retrofitted.
Please register to edit this issue