Should I get two messages about building a tree from one call of saxon:parse-html and using the -t option?
Added by Martin Honnen over 4 years ago
I noticed an oddity when calling saxon:parse
in XQuery with both Saxon 10.1 EE and Saxon 9.9.1.7 EE when called from the command line with the -t
option, I get two messages about building a tree:
PS C:\Users\marti\SomePath> java -cp 'C:\Program Files\TagSoup\tagsoup-1.2.1.jar;C:\Program Files\Saxonica\SaxonEE10-1J\saxon-ee-10.1.jar' net.sf.saxon.Query -t test2020051701.xq !method=text
Saxon-EE 10.1J from Saxonica
Java version 1.8.0_242
Using license serial number ...
Analyzing query from test2020051701.xq
Analysis time: 231.9674 milliseconds
Loading org.ccil.cowan.tagsoup.Parser
Building tree for file:/C:/Users/marti/SomePath/test2020051701.xq using class net.sf.saxon.tree.tiny.TinyBuilder
Tree built in 173.5206ms
Tree size: 16446 nodes, 129135 characters, 17572 attributes
Loading org.ccil.cowan.tagsoup.Parser
Building tree for file:/C:/Users/marti/SomePath/test2020051701.xq using class net.sf.saxon.tree.tiny.TinyBuilder
Tree built in 75.8662ms
Tree size: 16446 nodes, 129135 characters, 17572 attributes
SpainExecution time: 1.3182261s (1318.2261ms)
Memory used: 65Mb
XQuery is
declare namespace saxon = "http://saxon.sf.net/";
declare default element namespace "http://www.w3.org/1999/xhtml";
saxon:parse-html(unparsed-text('https://en.wikipedia.org/wiki/Barcelona'))//table[@class='infobox geography vcard']//tr[@class = 'mergedtoprow'][th = 'Country']/td//a//text()
When I parse XML with e.g. parse-xml(unparsed-text(...))
I only get one message about building a tree.
Why do I get two such messages with saxon:parse-html
?
Replies (2)
RE: Should I get two messages about building a tree from one call of saxon:parse-html and using the -t option? - Added by Michael Kay over 4 years ago
Well spotted.
I haven't quite got to the bottom of this, especially why parse-xml
and parse-html
should be different. The expression is turned into a call on the key() function (because of the predicates) -- whether that's a good idea in this case is another question -- and it seems that the third argument of the key() function, which in this case is an expression that invokes saxon:parse-html(), is being evaluated twice: once (unnecessarily) when building the index to support the key, and once when doing the lookup.
Please register to reply