Project

Profile

Help

Bug #4498

closed

Attribute order in Saxon 10.0

Added by Johan Gheys about 4 years ago. Updated almost 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Internals
Sprint/Milestone:
Start date:
2020-03-24
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
10
Fix Committed on Branch:
10
Fixed in Maintenance Release:
Platforms:

Description

Is it deliberately done that the order of attributes in version 10.0 is completely different than it has been so far? What is the easiest way to keep the original order when applying a simple transformation such as filtering old records from a large file?

Actions #1

Updated by Michael Kay about 4 years ago

Order of attributes has always been undefined / implementation-defined, though many XML parsers retain the original attribute order and many operations in Saxon do so. We have made changes in 10.0 to allow finding an attribute in a large collection of attributes more efficient; this uses a HashMap which does not retain order.

There's no guaranteed way of keeping the original attribute order, but you can control the attribute order in the serialized result using the saxon:attribute-order output parameter. (Needs PE or higher).

You could try tweaking the static variable net.sf.saxon.om.SmallAttributeMap.LIMIT, currently set to 5, to a higher number; this is the threshold for using the large attribute set implementation. You would have to change the source code or use reflection because the variable isn't public.

Actions #2

Updated by Johan Gheys about 4 years ago

Thank you for your answer. But somehow the order has to be present because @*/. gives the attributes in "document order", right?

Actions #3

Updated by Michael Kay about 4 years ago

There's a stable order of attributes, but it's not predictable, and it's not necessarily related to the lexical order of attributes in the serialized XML.

Actions #4

Updated by Johan Gheys about 4 years ago

It was a bit strange to see that the order of the attributes in the result tree no longer determines the order in the serialized XML, but it may also be a mind switch. And of course we prefer a better performance over a more human readable result. We will probably use saxon:attribute-order = "*" as a compromise from now on. Our customers will detect a large number of differences once (based on a digest calculation), but after that everything will remain stable. Thank you for the explanation.

Actions #5

Updated by Michael Kay about 4 years ago

  • Tracker changed from Support to Bug
  • Category set to Internals
  • Status changed from New to In Progress
  • Assignee set to Michael Kay
  • Applies to branch 10 added

Although this isn't a bug, it's a change that does appear to have caused a number of people usability problems, e.g. because diff checking of test results is affected. So I'm inclined to revert to a data structure for attribute sets that retains order of insertion.

It only affects the LargeAttributeMap which is used when there are more than 5 attributes. This currently uses an ImmutableHashTrieMap internally, so that incremental addition of new attributes is efficient. (For a SmallAttributeMap, the entire structure is copied when a new entry is added.) I'll look into whether we can find another map implementation that retains insertion order.

Actions #6

Updated by Michael Kay about 4 years ago

  • Status changed from In Progress to Resolved
  • Priority changed from Low to Normal
  • Fix Committed on Branch 10 added

I have changed the LargeAttributeMap implementation so it now maintains the order of attributes. (Specifically, if an attribute is added, it goes at the end, unless it is replacing an existing attribute with the same name, in which case it occupies the same position as that attribute).

This proved quite tricky to implement, given the requirement to use immutable data structures internally, but I found a way that seems reasonably efficient. However, because of the extra complexity, I've raised the threshold at which we start using the LargeAttributeMap from 5 attributes to 8.

Actions #7

Updated by O'Neil Delpratt almost 4 years ago

  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 10.1 added

Bug fix committed in the Saxon 10.1 maintenance release.

Actions #8

Updated by O'Neil Delpratt almost 4 years ago

  • Status changed from Resolved to Closed

Please register to edit this issue

Also available in: Atom PDF