Project

Profile

Help

Stripping Whitespace in 8.9 appears broken

Added by Anonymous over 17 years ago

Legacy ID: #4304078 Legacy Poster: Andrew Waters (andrewwaters)

Hi, I'm an XQuery novice, and I have not read and fully digested the W3C XPath/XQuery specs so this may not be a valid observation...but: In saxon 8.9.0.02 (Saxon 8.9.0.2J from Saxonica) using the -sall option breaks saxon for comparison functions/expresions. Taking the EXACT SAME xml input as $ina and $inb and calling a simple compare like $ina[1] eq $inb[2] results in "true" if -sall is not used BUT if -sall is specified the result is "false". The exact time XQuery and xml input on saxon 8.6 works as expected (i.e. teh result is true in both cases). As a novice I am also confused as to why saxon treats whitespace between elements as significant e.g. <a><b>data</b></a> is NOT EQUAL to <a> <b>data</b> </a> regardless of the whitespace settings. Am I missing something here? Many thanks. Andrew Waters.


Replies (6)

Please register to reply

RE: Stripping Whitespace in 8.9 appears broke - Added by Anonymous over 17 years ago

Legacy ID: #4304369 Legacy Poster: Michael Kay (mhkay)

>$ina[1] eq $inb[2] results in "true" if -sall is not used BUT if -sall is specified the result is "false". Without knowing what $ina and $inb are, I have no way of knowing whether the first item in $ina should be equal to the second item in $inb or not. Could you provide a reproducible test case, please? >As a novice I am also confused as to why saxon treats whitespace between elements as significant e.g. <a><b>data</b></a> is NOT EQUAL to <a> <b>data</b> </a> regardless of the whitespace settings. Am I missing something here? Well, it should certainly be not-equal if the whitespace is significant, which is the default if it comes from parsed unvalidated XML input, but not if it's constructed in the query itself. Again, I need to know exactly what you are doing, for example whether this data is parsed from an external file or constructed in the query, and for that matter, what exactly do you mean by "NOT EQUAL". And there are an awful lot of things that could be construed as "whitespace settings", come to that.

RE: Stripping Whitespace in 8.9 appears broke - Added by Anonymous over 17 years ago

Legacy ID: #4304421 Legacy Poster: Andrew Waters (andrewwaters)

Sorry, a bit economic with the "facts" there! xquery is: xquery version "1.0"; declare default element namespace "http://www.abc.com/abc"; declare variable $a := doc('file:///D:/NP_XML_Mapping/ina.xml'); declare variable $ina := $a//a; declare variable $inb := //a; <abc> {$ina[1] eq $inb[1]} </abc> ina.xml and "Copy of ina.xml" is: <abc xmlns="http://www.abc.com/abc"> <a> <insideIpAddress>1.1.1.1</insideIpAddress> <outsideIpAddress>11.11.11.11</outsideIpAddress> </a> <a> <insideIpAddress>2.2.2.2</insideIpAddress> <outsideIpAddress>22.22.22.22</outsideIpAddress> </a> </abc> saxon command line and output is: java -cp saxon8.jar net.sf.saxon.Query -s "D:\NP_XML_Mapping\Copy of ina.xml" D:\NP_XML_Mapping\bug.xq <?xml version="1.0" encoding="UTF-8"?> <abc xmlns="http://www.abc.com/abc&quot;&gt;true&lt;/abc> java -cp saxon8.jar net.sf.saxon.Query -s "D:\NP_XML_Mapping\Copy of ina.xml" -sall D:\NP_XML_Mapping\bug.xq <?xml version="1.0" encoding="UTF-8"?> <abc xmlns="http://www.abc.com/abc&quot;&gt;false&lt;/abc>

RE: Stripping Whitespace in 8.9 appears broke - Added by Anonymous over 17 years ago

Legacy ID: #4304684 Legacy Poster: Michael Kay (mhkay)

You're right, there's a bug here. Thanks for reporting it. The -sall setting is affecting documents loaded using the doc() function, and it is not affecting documents loaded using the -s option on the command line, so the two documents are handled differently. The changes needed to fix this are small, but they are on a path used by many different functions, so they will need careful testing before I release them.

RE: Stripping Whitespace in 8.9 appears broke - Added by Anonymous over 17 years ago

Legacy ID: #4305503 Legacy Poster: Andrew Waters (andrewwaters)

Michael, Thanks for the feedback. Not sure if I should start a new thread, but my follow-up point to the original post queried the handling of whitespace between elements. If you place a space BETWEEN any 2 elements (e.g. 1st <a> and 1st <insideIpAddress> in one of the xml files above, then the xquery will return false in all cases except when -sall is coded (when bug is fixed). I struggle with this concept, as in my brain these //<a> nodes are identical in "xml terms". I've seen some comments before stating that saxon is more compliant to the W3C specifications and maybe this is why I see this behaviour (XmlSpy for instance does not treat this example as "not equal"), but again to me that would make me think that the specs are wrong :-). Could you put me straight please? - thanks. Andrew Waters.

RE: Stripping Whitespace in 8.9 appears broke - Added by Anonymous over 17 years ago

Legacy ID: #4305530 Legacy Poster: Michael Kay (mhkay)

Whitespace in XML is significant unless you say otherwise. Remember that XML was invented for documents, not for data. If you write <para>He was <emph>very<emph> <term>insouciant</term></para> it should be obvious that the whitespace between the two child elements is significant. The Microsoft XML parser (which XMLSpy uses) notoriously gets this wrong. The only case where it's acceptable to ignore whitespace without an explicit user say-so is when there is a DTD or schema that defines the content of the element as "element only", that is, not allowing any immediately contained PCDATA (text) other than whitespace.

RE: Stripping Whitespace in 8.9 appears broke - Added by Anonymous over 17 years ago

Legacy ID: #4305576 Legacy Poster: Andrew Waters (andrewwaters)

Michael, Thanks for the explanation. When you put it like that it's obvious! Andrew Waters.

    (1-6/6)

    Please register to reply