Project

Profile

Help

XQuery: Counting instances of distinct-values

Added by Anonymous over 16 years ago

Legacy ID: #4648050 Legacy Poster: James Cummings (jcummings)

Hi There. I am using the saxon collection() function in XQuery to run a query over a number of documents in the filesystem. (docs.xml contains some /collection/doc/@href which point to the documents) What I want to do is look through these documents and then produce a sorted list of all the distinct-values() for each of the tei:persName elements in the documents. I have this working, but I thought it would be helpful to add a count of how many times this particular distinct-value of a tei:persName existed. So I want to output something like: <li>[personFoo] (5/150)</li> meaning that 'personFoo' occurs as a tei:persName 5 times out of the total 150 tei:persName elements in the collection. So my XQuery looks like this (wrapped in a bit of XHTML): ===== declare namespace tei="http://www.tei-c.org/ns/1.0"; declare option saxon:output "method=xhtml"; declare option saxon:output "doctype-system=http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"; declare option saxon:output "doctype-public=//W3C//DTD XHTML 1.0 Strict//EN"; <html xml:lang="en"> <head><title>Foo</title></head> <body><div><ol> { let $collection := collection('docs.xml') let $persNames := $collection//tei:text//tei:persName let $countName := count($persNames) for $distinctPersNames in distinct-values($persNames) let $distinctCountName := count($distinctPersNames) order by lower-case(normalize-space($distinctPersNames)) return <li>[{$distinctPersNames}] ({$distinctCountName} / {$countName})</li> } </ol></div></body></html> ===== Any suggestions how I can get $distinctCountName to count the number of instances of distinct-values of $persNames for that iteration of $distinctPersNames? Thanks, -James


Replies (4)

Please register to reply

RE: XQuery: Counting instances of distinct-va - Added by Anonymous over 16 years ago

Legacy ID: #4648105 Legacy Poster: Michael Kay (mhkay)

This one is much easier in XSLT using xsl:for-each-group! In XQuery you have to repeat the retrieval, which can be very expensive (though it will benefit from indexing in Saxon-SA). Something like let $distinctCountName := count($collection//tei:text//tei:persName[.=$distinctPersNames)]) Michael Kay http://www.saxonica.com/

RE: XQuery: Counting instances of distinct-va - Added by Anonymous over 16 years ago

Legacy ID: #4649847 Legacy Poster: James Cummings (jcummings)

I thought it would be easier in XSLT. Maybe I should just re-implement in that. It is significantly more expensive... like from 3 seconds to 3 minutes. (Not in Saxon-SA). There are currently about 40,000 persNames with about 9000 distinct-values(). In this case it is just a one-off list so I thought I'd try in XQuery (I can wait the 3 minutes between attempts generating the right thing ;-) ). If I have to do something similar I'd probably do it either in XSLT or pre-make an index of distinct-values(persNames) But, for my edification in XQuery.... The problem with this way of doing distinct-values() is that it any whitespace, carriage returns, and case count as distinct-values(). When I try to normalize-space() on them it (quite rightly) complains about this being a nodeset rather than a single string. What is the right way around this? Should I do a for with a another nested for? -James

RE: XQuery: Counting instances of distinct-va - Added by Anonymous over 16 years ago

Legacy ID: #4649901 Legacy Poster: Michael Kay (mhkay)

>It is significantly more expensive... like from 3 seconds to 3 minutes. You're saying the XQuery solution under Saxon-B (without join optimization) is more expensive, yes? >The problem with this way of doing distinct-values() is that it any whitespace, carriage returns, and case count as distinct-values(). Well, you can apply distinct-values to any function of the value, for example distinct-values($input/normalize-space(upper-case(.))) >When I try to normalize-space() on them it (quite rightly) complains about this being a nodeset rather than a single string. Sounds like you tried to apply normalize-space() to the whole set of values rather than applying it to each value individually, as in the above example.

RE: XQuery: Counting instances of distinct-va - Added by Anonymous over 16 years ago

Legacy ID: #4651710 Legacy Poster: James Cummings (jcummings)

>>It is significantly more expensive... like from 3 seconds to 3 minutes. >You're saying the XQuery solution under Saxon-B (without join optimization) is more expensive, yes? Well I mean using Saxon-B without doing the counting comparison, versus using Saxon-B with doing the counting comparison. And now that I normalize-space(lower-case(.)) the value to be compared it is even slower. (But as I said this is a one-off where all I needed was a result. I'll do the real thing in XSLT.) >>The problem with this way of doing distinct-values() is that it any whitespace, carriage returns, and >>case count as distinct-values(). >Well, you can apply distinct-values to any function of the value, for example >distinct-values($input/normalize-space(upper-case(.))) That's exactly what I was doing wrong.. I was doing distinct-values(normalize-space(lower-case($input)) ... doh >>When I try to normalize-space() on them it (quite rightly) complains about this being a nodeset rather >>than a single string. >Sounds like you tried to apply normalize-space() to the whole set of values rather than applying it to >each value individually, as in the above example. Yup. Sorted now, works perfectly if slowly.

    (1-4/4)

    Please register to reply