Project

Profile

Help

Should I get two messages about building a tree from one call of saxon:parse-html and using the -t option?

Added by Martin Honnen almost 4 years ago

I noticed an oddity when calling saxon:parse in XQuery with both Saxon 10.1 EE and Saxon 9.9.1.7 EE when called from the command line with the -t option, I get two messages about building a tree:

PS C:\Users\marti\SomePath> java -cp 'C:\Program Files\TagSoup\tagsoup-1.2.1.jar;C:\Program Files\Saxonica\SaxonEE10-1J\saxon-ee-10.1.jar' net.sf.saxon.Query -t test2020051701.xq !method=text
Saxon-EE 10.1J from Saxonica
Java version 1.8.0_242
Using license serial number ...
Analyzing query from test2020051701.xq
Analysis time: 231.9674 milliseconds
Loading org.ccil.cowan.tagsoup.Parser
Building tree for file:/C:/Users/marti/SomePath/test2020051701.xq using class net.sf.saxon.tree.tiny.TinyBuilder
Tree built in 173.5206ms
Tree size: 16446 nodes, 129135 characters, 17572 attributes
Loading org.ccil.cowan.tagsoup.Parser
Building tree for file:/C:/Users/marti/SomePath/test2020051701.xq using class net.sf.saxon.tree.tiny.TinyBuilder
Tree built in 75.8662ms
Tree size: 16446 nodes, 129135 characters, 17572 attributes
SpainExecution time: 1.3182261s (1318.2261ms)
Memory used: 65Mb

XQuery is

declare namespace saxon = "http://saxon.sf.net/";

declare default element namespace "http://www.w3.org/1999/xhtml";

saxon:parse-html(unparsed-text('https://en.wikipedia.org/wiki/Barcelona'))//table[@class='infobox geography vcard']//tr[@class = 'mergedtoprow'][th = 'Country']/td//a//text()

When I parse XML with e.g. parse-xml(unparsed-text(...)) I only get one message about building a tree.

Why do I get two such messages with saxon:parse-html?


Replies (2)

RE: Should I get two messages about building a tree from one call of saxon:parse-html and using the -t option? - Added by Michael Kay almost 4 years ago

Well spotted.

I haven't quite got to the bottom of this, especially why parse-xml and parse-html should be different. The expression is turned into a call on the key() function (because of the predicates) -- whether that's a good idea in this case is another question -- and it seems that the third argument of the key() function, which in this case is an expression that invokes saxon:parse-html(), is being evaluated twice: once (unnecessarily) when building the index to support the key, and once when doing the lookup.

    (1-2/2)

    Please register to reply