Project

Profile

Help

Bug #3561

closed

generate-id() on attribute and namespace nodes may produce a non-ASCII string

Added by Michael Kay about 7 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
XPath Conformance
Sprint/Milestone:
-
Start date:
2017-12-07
Due date:
% Done:

100%

Estimated time:
Applies to JS Branch:
0.9, 1.0, Trunk
Fix Committed on JS Branch:
Trunk
Fixed in JS Release:
SEF Generated with:
Platforms:
Company:
-
Contact person:
-
Additional contact persons:
-

Description

The rules for generate-id() require that the returned ID consists entirely of ASCII alphanumerics. Saxon-JS does not conform to this when the node in question is an attribute or namespace; it copies the (local) name of the node into the generated ID.

The spec states:

The returned identifier must consist of ASCII alphanumeric characters and must start with an alphabetic character. Thus, the string is syntactically an XML name.

But of course, not every valid XML name consists entirely of ASCII alphanumeric characters.

Actions #1

Updated by Michael Kay about 7 years ago

  • Description updated (diff)
Actions #2

Updated by Michael Kay about 7 years ago

Note also that the form used for attributes doesn't necessarily generate a unique ID, because it only uses the local name of the attribute and not the namespace URI.

In addition, I think the algorithm for generate-id() fails for nodes that are not part of a tree rooted at a document node.

I would suggest: for documents, elements, comments, PIs, and text nodes having document node as ancestor, use the current algorithm.

For attributes, namespaces, and "non-document" nodes of other kinds, allocate a key by incrementing some global sequence number (held perhaps in the context), and store this key as a property (_saxon_generated_id) of the node. Note that when a node is copied, this property should be dropped.

Actions #3

Updated by Michael Kay about 5 years ago

  • Description updated (diff)
  • Status changed from New to Resolved
  • Applies to JS Branch 0.9, 1.0, Trunk added
  • Fix Committed on JS Branch Trunk added

Non-ascii characters in IDs for attributes and namespaces: I have fixed this by "asciifying" the node names (EQName in the case of attributes); this is done by replacing all characters not in [A-Za-z] with their numeric character code, with a leading zero. New test case expression-2102 demonstrates the bug and tests the fix. Fix applied to 2.0 only, but it could be retrofitted.

Actions #4

Updated by Michael Kay about 5 years ago

Nodes in trees with no document root turn out not to be a problem: test case expression-2103 added to demonstrate this.

Actions #5

Updated by Debbie Lockett over 4 years ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in JS Release set to Saxon-JS 2.0
Actions #6

Updated by Debbie Lockett over 4 years ago

  • Category set to XPath Conformance

Please register to edit this issue

Also available in: Atom PDF Tracking page