Not producing text correctly from cmd line
Added by Anonymous about 13 years ago
Legacy ID: #10637858 Legacy Poster: Matthew Halverson (mhalver)
I'm trying to process an xml file using xquery and print the result as a text file with each returned value on one line. It seems that saxon is adding an extra space to each of my entries. Here is a minimal example xml file to illustrate this (Scores.xml): [code] Bill Jones B James Smith A Sally Masters F [/code] And the query that I am running (query.q): [code] declare option saxon:output "omit-xml-declaration=yes"; declare option saxon:output "method=text"; for $stud in doc("Scores.xml")/Students/Student order by $stud/LastName, $stud/FirstName return concat($stud/FirstName," ",$stud/LastName," ") [/code] The output that I am getting is: [code] Bill Jones Sally Masters James Smith (extra blank line here) [/code] I am using saxon 9.3he, java jdk 1.6.0_18 on Windows 7, and am running my query from the command line with java -cp saxon9he.jar net.sf.saxon.Query -q:query.q Without the new line added, the results show up as one line seperated by a space. If I add the newline before the name (which causes a blank line), I have confirmed that there is an extra space after the names. As far as I have been able to tell, saxon is printing item 1 followed by a space then item 2 followed by a space then item 3 followed by a space. Thus my weird indention is caused by the extra space being added after the previous name, without any line feeds of its own. Wrapping the entire query in a string-join with the newline gives the correct result. Is this intended behavior, and how do I get the result that I want? I understand that saxon uses the document function on the output, but my understanding of the spec says that the text nodes should be concatenated without any extra spaces, and it looks to me like there is extra space being added, although there is a good chance that I am misunderstanding the spec. Any help would be appreciated.
Replies (7)
Please register to reply
RE: Not producing text correctly from cmd line - Added by Anonymous about 13 years ago
Legacy ID: #10637870 Legacy Poster: David Lee (daldei)
That is correct. The standard serialization for sequences is to seperate items by a space. http://www.w3.org/TR/xslt-xquery-serialization/ If you want to avoid that then produce a single string as your output not a sequence. like: string-join( ( for $stud in doc("Scores.xml")/Students/Student order by $stud/LastName, $stud/FirstName return concat($stud/FirstName," ",$stud/LastName ) , " " )
RE: Not producing text correctly from cmd line - Added by Anonymous about 13 years ago
Legacy ID: #10637905 Legacy Poster: Michael Kay (mhkay)
As daldei says, this is correct according to the spec. Another workaround is to output text nodes: [code]declare option saxon:output "omit-xml-declaration=yes"; declare option saxon:output "method=text"; for $stud in doc("Scores.xml")/Students/Student order by $stud/LastName, $stud/FirstName return (text{$stud/FirstName, $stud/LastName}, text{" "})[/code]
RE: Not producing text correctly from cmd line - Added by Anonymous about 13 years ago
Legacy ID: #10637907 Legacy Poster: Matthew Halverson (mhalver)
Ok, I see that in the spec now (section 2 of the page that you listed). I had tried the sting-join approach and it did work, it just feels like that can't be the way this was intended to be done, but looking at that, it is. This was driving me nuts on trying to get around that extra space. Thank you so much for helping.
RE: Not producing text correctly from cmd line - Added by Anonymous about 13 years ago
Legacy ID: #10637943 Legacy Poster: Matthew Halverson (mhalver)
Thank you, Mr. Kay. This does work as well (although at the moment, I'm not as clear why - I'm going to have to stare at it a bit). Thank you both for the quick reply.
RE: Not producing text correctly from cmd line - Added by Anonymous about 13 years ago
Legacy ID: #10637957 Legacy Poster: David Lee (daldei)
The reason text nodes work differently from strings is according to the XDM serialization specs, text nodes are concatenated whereas string (atomic values aka xs:string values) are separated by spaces. Its just the way it is. -David
RE: Not producing text correctly from cmd line - Added by Anonymous about 13 years ago
Legacy ID: #10638601 Legacy Poster: Matthew Halverson (mhalver)
I think my confusion was in understanding the difference between actual text-nodes and a bunch of text values (strings) in the document. I think that I can see what is going on now. I prefer the string-join approach (because I don't get an extra blank line), but thank you very much for your suggestion Mr. Kay, that is the one that helped me see what is going on.
RE: Not producing text correctly from cmd line - Added by Anonymous about 13 years ago
Legacy ID: #10639522 Legacy Poster: Michael Kay (mhkay)
Yes, the distinction between text nodes and strings is a very subtle one, whether you are using XQuery or XSLT, and when it comes to controlling whitespace it's an important distinction to understand.
Please register to reply