Project

Profile

Help

Confused by saxon:sort() on text{} nodes

Added by Anonymous almost 17 years ago

Legacy ID: #4755248 Legacy Poster: Chapman Flack (jcflack)

For doing semi-interactive queries, the saxon:sort function is a nice shorthand compared to typing out a FLWOR expression. But while saxon:sort($is) does what I expect when the $is are strings, it doesn't do what I expect with constructed text nodes - as in saxon:sort(for $i in $is return text { $i }). I get them in an unpredictable order (can vary from run to run). I'm probably misunderstanding something about how/when the sort() result order reverts to the 'document order' when the node sequence is later used. (I don't know what the document order is for a sequence of nodes made by computed constructors, so maybe that's why it isn't predictable.) The documentation for saxon:sort explains that it works on nodes as well as on atomic values, but do I need some ordered or unordered clause somewhere to preserve the sorted order of the sequence long enough to do anything with it? I'm using B 9.0.0.2J. Naturally my tiniest stripped-down example doesn't illustrate the issue - this gives nice ordered output: let $ts := for $s in ( 'foo', 'bar', 'quux', 'murbly', 'weckett', 'baz', 'grimble' ) return text { $s } return saxon:sort($ts) So I need to (sorry!) include a bigger excerpt from real life to show what I'm seeing. This one gives me an unexpected order. (Note that removing the final /local:line(.) does not fix it - I thought it might be the axis step that undoes the sort, but no such luck.) declare namespace saxon = 'http://saxon.sf.net/'; declare function local:uniq($is as item()) as text() { for $i in distinct-values($is) return text { $i } }; declare function local:line($t as text()) as text() { text { concat($t,'&#10;') } }; declare option saxon:output 'method=text'; declare variable $RESPONSE := <searchResponse requestID="b"> <searchResultEntry dn="redacted"/> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>pfaculty</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>pstaff</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>pfaculty</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>office</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>office</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>pfaculty</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>office</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>vfaculty</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>pfaculty</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>pfaculty</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>office</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>pstaff</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>pfaculty</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>pfaculty</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>pfaculty</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>taoutsid</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>vfaculty</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>contlec</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>pfaculty</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>gstudent</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>contlec</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>contlec</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>gstudent</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>ltlectur</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>ltlectur</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>office</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>Visiting Scholar</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>vfaculty</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>pfaculty</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>vfaculty</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>gstudent</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>gstudent</value> </attr> </searchResultEntry> <searchResultEntry dn="redacted"> <attr name="employeeType"> <value>gstudent</value> </attr> </searchResultEntry> </searchResponse>; $RESPONSE/saxon:sort(local:uniq(.//attr/value))/local:line(.)


Replies (4)

Please register to reply

RE: Confused by saxon:sort() on text{} nodes - Added by Anonymous almost 17 years ago

Legacy ID: #4755257 Legacy Poster: Michael Kay (mhkay)

You're almost there: the "/" operator sorts nodes into document order. So given $RESPONSE/saxon:sort(local:uniq(.//attr/value)) the saxon:sort should sort things for you, and then the "/" will unsort them.

RE: Confused by saxon:sort() on text{} nodes - Added by Anonymous almost 17 years ago

Legacy ID: #4755469 Legacy Poster: Chapman Flack (jcflack)

Oh (light dawns), you mean the / to the left of saxon:sort combines and de-duplicates all of the nodes produced by its right-hand side, and leaves them in document order. Ok, I've found that now in the spec. Thanks! And looking now at the spec more closely, am I right that E1/E2 does not first coerce E1 back into document order, so that saxon:sort($RESPONSE/local:uniq(.//attr/value))/local:line(.) might work? Oh no, because line() returns a text node too. I was trying for a nice XPath-y idiom to produce a series of text lines (method=text) in a known order, but I'm starting to suspect the language is stacked against me. If my expression produces simply strings rather than text nodes, the serializer joins them with spaces (looking like a spurious space beginning each line but the first). I could join all lines into a single string explicitly, without spaces, but for large output I'd rather produce a text node for each line so the serializer can dispose of it right away. I could produce per-line strings, and wrap the entire expression in an outermost function call that makes them text nodes, but that's not much prettier than an outermost FLWOR. I wonder how difficult would be a user-defined serialization attribute for method=text that suppresses the space between strings.... Does the spec say anything about the 'document order' of a sequence of nodes constructed from scratch? I suppose not. It would be so handy if it was 'order of construction'. -Chap

RE: Confused by saxon:sort() on text{} nodes - Added by Anonymous almost 17 years ago

Legacy ID: #4755773 Legacy Poster: Michael Kay (mhkay)

Unfortunately you are right in assuming that if you generate text nodes like this: for $i in 1 to 20000 return text{$i} then they will be streamed through the serializer, whereas if you do a string-join() string-join(for $i in 1 to 20000 return string(i), ",") then they won't. There's in fact no intrinsic reason why that should be the case, it's just that string-join() unlike FLWOR doesn't currently have a push-mode implementation. Rather than implement non-standard serialization options, the easy answer is to provide a push-mode implementation of string-join. However, I don't think that's necessary. You should be able to replace $R/sort(x) with for $r in $R return sort($r/x) or something similar to avoid the resorting. >Does the spec say anything about the 'document order' of a sequence of nodes constructed from scratch? No, it doesn't. With Saxon it will tend to be order of construction in the case of document or element nodes, but random order for other kinds of node.

RE: Confused by saxon:sort() on text{} nodes - Added by Anonymous almost 17 years ago

Legacy ID: #4755896 Legacy Poster: Michael Kay (mhkay)

The other thing you could do, of course, is to construct a tree containing a hierarchy of element nodes, and then rely on the fact that the text output method discards the markup. Don't know if that helps at all.

    (1-4/4)

    Please register to reply