Feature #6507: TraceListener enhancements for code coverage - Saxon - Saxonica Developer Community

Actions

Send by e-mail Copy link

Feature #6507

open

TraceListener enhancements for code coverage

Added by Michael Kay 6 months ago. Updated 5 months ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Diagnostics

Sprint/Milestone:

-

Start date:

2024-08-13

Due date:

% Done:

0%

Estimated time:

Legacy ID:

Applies to branch:

Fix Committed on Branch:

Fixed in Maintenance Release:

Platforms:

Description

Raised following zoom call with Amanda Galtman and Adrian Bird.

XSpec is currently analysing the XML output by the TraceListener and using this to produce a coverage report.

There are two major problems with this.

Firstly, the basic assumption of the TraceListener producing XML output is that we can make it well-formed. This assumption is flawed; it's difficult or impossible to handle try/catch, and it seems fundamentally impossible to handle multithreaded execution. Given these two major obstacles, fixing the relatively minor problems that have been reported (e.g. accumulator evaluation not being traced) seems a poor use of resources.

For coverage analysis, it would be much better

(a) to have a TraceListener that accumulates counts of how often each trace point is encountered (actually, we only need a binary yes/no for current purposes, but a count costs no more and might be useful for other applications.)

(b) to have knowledge of trace points that exist but were not encountered. Currently there's no easily accessible API that provides this information.

Files

Download all files

CoverageTraceListener.java (10 KB) CoverageTraceListener.java		Michael Kay, 2024-08-15 11:29
coverage.xml (16.9 KB) coverage.xml		Michael Kay, 2024-08-15 11:51
sample-copy-of-column.xsl (1.99 KB) sample-copy-of-column.xsl		A Galtman, 2024-08-19 23:24
sample-copy-of.xml (163 Bytes) sample-copy-of.xml		A Galtman, 2024-08-19 23:25

Actions

#1

Updated by Michael Kay 6 months ago

My proposal would be that

(a) TraceListener.open() is currently provided with a reference to the Controller object. We should add to the Controller a method getAllTracePoints() which returns a map whose keys are all the Traceable objects in the stylesheet, and whose corresponding values are initialized to zero.

(b) We provide a new implementation of the TraceListener interface in which

(i) the open() method gets this initial map from the Controller,

(ii) the enter() method increments the count for the corresponding Traceable

(iii) the close() method serialises the histogram as XML to the output destination

Processing this XML then becomes the responsibility of XSpec, as it is at the moment.

Actions

#2

Updated by Michael Kay 6 months ago

A further suggestion was made that we should provide better customisation of exactly where trace points are injected into the code. Currently this decision is made by the specific CodeInjector; the one normally used for XSLT is the XSLTTraceCodeInjector, which makes decisions based on the TraceLevel (none, low, normal, high). A custom CodeInjector can be written, and registered with the XSLT compiler at the level of the CompilerInfo object. Writing a custom CodeInjector gives very low-level detailed control, while setting the TraceLevel is pretty coarse. Might there be a middle ground that is easier to configure?

Actions

#3

Updated by Michael Kay 6 months ago

Some questions from Amanda, with my answers:

We talked about categories A, B, and C as follows. Does every node in a stylesheet fall into one of those categories?

A. The node is a trace point and it was encountered N times, for some positive integer N. B. The node is a trace point and it was encountered zero times. C. The node is not a trace point.

I think the answer is yes. With perhaps a little bit of a caveat as to what exactly a "trace point" is. A trace point is actually a node injected into the compiled expression tree, not into the source stylesheet. The correspondence between nodes in the compiled expression tree and nodes in the source stylesheet isn't exact.

Will there be a documented list of elements or other patterns that are not trace points? Norm mentioned the idea of elements whose use-when attribute evaluates to false, which is why I wrote "other patterns".

Documented may be a bit ambitious! Yes, Norm is right, the first stage of stylesheet compilation (filtering out use-when sections) causes chunks of the stylesheet to disappear, these chunks will contain no trace points. There may be other things that we drop before we do trace code injection, such as unreferenced global variables. And then, in a part of the stylesheet that IS visited by the trace code injector, it makes decisions about where to put the trace points. In principle that's configurable by tweaking the trace code injector.

Are the following elements special as far as being trace points or being counted goes?

In some cases, it's not that Saxon is treating the instruction differently, it's that it's general rules affect the treatment. For example, there's a general rule that an "instruction" isn't traced if it is the child of a "non-instruction". That's very likely to affect, for example, the handling of sort keys, because the instructions to compute sort keys are compiled into children of a sort expression which isn't an actual instruction in its own right.

Also, the decisions here are primarily being made by the XSLTTraceCodeInjector which is pluggable. You could use a different CodeInjector with different algorithms if you chose.

xsl:accumulator and descendants

The main difference here is when the code gets executed, and that shouldn't make any difference if we're only counting executions.

xsl:assert and descendants

xsl:assert is dropped from the tree very early on if assertions are disabled, which is the default.

xsl:attribute as a child of xsl:attribute-set, and deeper descendants

In principle that shouldn't make a difference. It's possible that the current code isn't injecting trace points for xsl:attribute set components, but that would be an oversight. We don't inline attribute set definitions, for reasons that I don't currently recall (I think there are edge cases that make it difficult).

xsl:context-item

That's purely a static declaration, it's not executable, so it isnt' traced.

xsl:evaluate

I think that dynamically executed code using xsl:evaluate isn't going to be traced. The instruction itself should be.

xsl:fallback

This is a bit like xsl:use-when - we make an early static decision whether to include or exclude the code in the executable.

xsl:map, xsl:map-entry

These compile down into XPath expressions, so they may be considered too fine-grained to be traced at the default tracing level for XSLT.

xsl:matching-substring, xsl:non-matching-substring

Again, it's possible that these are considered too fine-grained.

xsl:merge-action, xsl:merge-key, xsl:merge-source

The merge-action should be traced normally. The source and key are essentially declarations rather than executable instructions.

xsl:on-completion

That's going to be treated as an operand of xsl:iterate; may be too fine-grained

xsl:on-empty, xsl:on-non-empty

Like xsl:sequence, the XSLT instruction at source level is simply a syntactic wrapper for the XPath expression (assuming @select is used), so the instruction as such might disappear from the expression tree. There are some complications here discussed in bug #6428.

xsl:param as a child of xsl:iterate

Could come under the rule that the generated code isn't under an instruction node.

xsl:param as a child of xsl:template -- and does counting xsl:param mean the default value was used during execution?

Similar.

xsl:perform-sort with select attribute or with xsl:sort as the only child element

xsl:perform-sort gets compiled into a sort expression.

xsl:sort

This is primarily declarative. There may be instructions involved in computing a sort key, but they're likely to be buried inside the sort expression (which is a non-instruction).

xsl:try, xsl:catch

Can't immediately think of any special considerations here. But we might be getting it wrong, because we're generating the same code as for a try/catch expression in XQuery.

xsl:when, xsl:otherwise

All forms of conditional (xsl:if, xsl:choose, XPath if-then-else, XQuery typeswitch) generate a choose expression that's modelled on xsl:choose: it's basically a sequence of (condition, action) pairs.

xsl:where-populated

Like try/catch, I think we might have implemented that as an expression rather than an instruction.

xsl:with-param

Not sure.

XSLT elements having a select attribute and no sequence constructor (e.g., xsl:sequence, xsl:break, xsl:copy)

Basically the default tracing rules for XSLT are to trace at the "instruction" level and that means some of the fine-grained events will be lost.

xsl:variable is tricky, as we discussed a little when Adrian brought up optimizations. What does the eventual design say about tracing variables, and is it different for global vs. nonglobal variables?

Variables are particularly prone to being inlined (that is, the variable reference is replaced by the expression bound to the variable). That's equally true for local and global variables. We talked about optimization: I think that in general the right approach is that the optimization strategy shouldn't be deliberately changed just because you're tracing, though sometimes the existence of trace code will stop the optimizer recognising code patterns that it would otherwise optimize. But the user has the option of disabling optimizations selectively, for example opt:-v disables variable inlining.

Is anything different for stylesheet nodes that are encountered via call to transform()?

Yes. What happens here is that the dynamic stylesheet is compiled and executed with the same trace code injector and trace listener as the master stylesheet. The trace listener, on fn:transform(), sees a nested call to open(). The timing trace listener handles this by creating one output table for each distinct transformation. I think the coverage listener should probably produce a single amalgamated report.

Beyond XSLT elements: Is there anything we need to know about treatment of text nodes? How about literal result elements?

It all works at the level of the compiled expression tree, where many of these distinctions no longer exist. For example an xsl:choose at the XSLT level compiles into exactly the same expression tree as an if - then - else at the XPath level, and an AVT compiles into exactly the same code as an xsl:attribute instruction. A literal result element generates the same code as an xsl:element instruction with a statically-known name and namespace.

Would you expect the tracing behavior to vary among Saxon editions (HE, PE, EE)? In a way that's straightforward to document for end users, or not really?

Apart from the fact that EE optimizes more aggressively, I wouldn't expect any differences.

The issue you created on saxonica.plan.io uses the word "encountered". Is that the same thing as reached, or hit? I am just wondering if there is some subtle difference that I might be missing.

No, I think they're synonymous.

Actions

#4

Updated by Michael Kay 6 months ago

File CoverageTraceListener.java CoverageTraceListener.java added
File coverage.xml coverage.xml added

I've produced a first prototype for you to start playing with, to generate feedback. It's one Java file, class net.sf.saxon.trace.CoverageTraceListener, attached. You should be able to compile this to a class file, put it on your classpath, and run a command such as

java -cp .... net.sf.saxon.Transform -s:data/books.xml
-xsl:styles/books.xsl
--traceListenerClass:net.sf.saxon.trace.CoverageTraceListener
-opt:-j
-Tout:/Users/mike/Desktop/temp/coverage.xml

You can of course rename the class if you need to.

The -opt:-j is necessary at the moment because if you don't suppress just-in-time template rule compilation, the trace listener will be opened before all the trace expressions have been injected, so the discovery phase won't find them (and the prototype code outputs lots of messages saying that a trace point was found at execution time that hadn't been discovered during the static search).

There's some dead code here which I retained from the timing trace listener for putting the results through a stylesheet for formatting, I kept that in just in case we decide to do something beyond delivering the raw XML.

I used the -Tout option above to direct the XML output to a file, but it's not working for some reason: the XML is currently output to standard error. If that's a problem you could work around it by picking up the output destination from an environment variable.

I also attach a sample of the output so you know what to look for if it's working.

There's no attempt to handle fn:transform() yet.

As I say, this is very much a first cut so I can get your feedback.

Actions

#5

Updated by Michael Kay 6 months ago

You can also use the command line options

-s:data/books.xml -xsl:styles/books.xsl  -T:net.sf.saxon.trace.CoverageTraceListener -opt:-j -Tout:/Users/mike/Desktop/temp/coverage.xml

and in this case -Tout works as expected.

Actions

#6

Updated by Michael Kay 6 months ago

It occurs to me that we could take advantage of the fact that template rules are JIT-compiled by default: at the end of the run, we could look at all the template rules and report those that haven't been compiled, which means that they have never been matched by any node in the source document. That might be a more useful indicator than listing all the trace points within that template that weren't encountered at execution time.

To avoid the need to suppress JIT compilation, instead of scanning the stylesheet at the start of processing to find all the trace points, we could scan it at the end.

It may be worth pointing out here that a source xsl:template is compiled into multiple compiled templates - one for the named template if it has a name attribute, and one for each mode that it applies to, if it has a match attribute. This will result in multiple trace point reports for the same source code location. It would be helpful if the <T> output for a template rule indicated the relevant mode; this is harder to do for instructions within the template rule.

Actions

#7

Updated by Michael Kay 6 months ago

For the public release, it will be possible to run this using the command line option

-TC:coverage.xml

and JIT compilation will be suppressed automatically.

Actions

#8

Updated by A Galtman 6 months ago

Thanks, Michael, for your replies and the prototype.

I am trying to run your prototype and having a problem. I placed your .java file in a directory structure net\sf\saxon\trace\CoverageTraceListener.java, navigated to the parent directory in a Windows cmd window, and executed javac -cp C:\saxon12.5\saxon-he-12.5.jar net\sf\saxon\trace\CoverageTraceListener.java. The command generated a .class file alongside the .java file. However, when I try running the Saxon commands you mentioned, it can't find the listener.

java -cp .;C:\saxon12.5\saxon-he-12.5.jar net.sf.saxon.Transform -s:sample.xml -xsl:sample.xsl --traceListenerClass:net.sf.saxon.trace.CoverageTraceListener -opt:-j -Tout:C:\temp\coverage.xml
Transformation failed: Incorrect value for --traceListenerClass: net.sf.saxon.trace.CoverageTraceListener: Failed to load net.sf.saxon.trace.CoverageTraceListener

java -cp .;C:\saxon12.5\saxon-he-12.5.jar net.sf.saxon.Transform -s:sample.xml -xsl:sample.xsl -T:net.sf.saxon.trace.CoverageTraceListener -opt:-j -Tout:C:\temp\coverage.xml
Transformation failed: Failed to load net.sf.saxon.trace.CoverageTraceListener

I also tried putting Saxon 12.5 in my current directory, alongside that net directory (and then adjusting the -cp part of my commands), but that didn't help.

Is there an issue with my net\sf\saxon directory structure being separate from the one in saxon-he-12.5.jar? Or are you able to spot something I've done wrong? When I run a Saxon command without the trace listener, the XSLT transformation runs fine. In case it's relevant, my Java version information is

openjdk version "11.0.16" 2022-07-19 LTS
OpenJDK Runtime Environment Microsoft-38106 (build 11.0.16+8-LTS)
OpenJDK 64-Bit Server VM Microsoft-38106 (build 11.0.16+8-LTS, mixed mode)

Actions

#9

Updated by Adrian Bird 6 months ago

I had the same problem as Amanda and assumed it was related to the .jar file.

So I changed the package name in the file, put the java file in the correct folder structure for my package name and added a couple of imports that were needed from the net.sf.saxon.trace package:

import net.sf.saxon.trace.Traceable;
import net.sf.saxon.trace.XSLTTraceListener;

That worked ok for me.

Actions

#10

Updated by A Galtman 6 months ago

Thanks, Adrian. I changed the innermost package name only, to make the fully qualified name net.sf.saxon.coverage.CoverageTraceListener. Then I did the other things you mentioned, and now I am able to generate a trace! (Changing the entire package hierarchy didn't work.)

Next steps for me: generated traces for more stylesheets, study the results, and write back with questions/comments.

Actions

#11

Updated by A Galtman 6 months ago

I modified the XSpec code coverage reporting locally so that it calls your prototype listener and transforms the resulting data into the format that XSpec is looking for, so that the rest of the code coverage reporting works. (I changed the XSpec data format slightly, by adding elements with count=”0” where the prototype listener reports count=”0” and making other modifications to account for meaning the absence of a hit, not a hit.) I generated XSpec coverage reports for most of the XSpec repo's coverage reporting tests using (a) this modified code base I just described; (b) a further-modified one where I undid some of the logic in XSpec that augments what the raw Saxon data is saying; and (c) the original XSpec code base with the existing listener.

I am in the process of comparing the three sets of reports, and I will write back here.

Overall initial impression: I am pleased that it didn’t require too much work to connect the code and obtain coverage reports with your prototype. If the new kind of listener is a better foundation for Saxon’s side of things, it seems like that will work out fine for XSpec. There are some already-reported trace issues that we’ll still like to see fixed; for example, I think https://saxonica.plan.io/issues/6457 is still an issue but I will double-check.

Actions

#12

Updated by Michael Kay 6 months ago

Great.

for example, I think https://saxonica.plan.io/issues/6457 is still an issue but I will double-check.

Yes, any issue to do with things not being reported will still be present. But hopefully we can now focus on those issues, rather than well-formedness problems,

Actions

#13

Updated by A Galtman 6 months ago

File sample-copy-of-column.xsl sample-copy-of-column.xsl added
File sample-copy-of.xml sample-copy-of.xml added

When I compared the XSpec code coverage reports, between the existing trace listener and your new prototype (using Saxon-HE 12.5 in both cases), one thing that puzzled me in several test files was that the line/column coordinates listed in the trace with the new prototype didn't match the XSLT element. Sometimes, the trace said column="0" and sometimes the column seemed to represent the end of an XPath expression stored in an attribute of the element, not the end of the element itself. I've seen this phenomenon before (e.g., it showed up as part of https://saxonica.plan.io/issues/6448 and https://saxonica.plan.io/issues/6457) and never understood it.

Here's an example, reproducible outside XSpec (also attached as XSLT file and XML file).

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0"
                xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <!--
      Adapted from xsl:perform-sort Coverage Test Case
  -->
  <xsl:template match="xsl-perform-sort">
    <xsl:for-each select="(1, 2, 3)">
      <xsl:copy-of select="current()"/><!-- Reports line 9 column 0 -->
    </xsl:for-each>
    <xsl:variable name="sortedSet2" as="xs:string*">
      <xsl:perform-sort>
        <xsl:sort select="." />
        <xsl:copy-of
          select="node"
        /><!-- Reports line 15 column 4 -->
      </xsl:perform-sort>
    </xsl:variable>
    <xsl:value-of select="$sortedSet2" separator=" " />
  </xsl:template>
  <!--
    Command:    
    java -cp .;saxon-he-12.5.jar net.sf.saxon.Transform -s:sample-copy-of.xml -xsl:sample-copy-of-column.xsl -T:net.sf.saxon.coverage.CoverageTraceListener -opt:-j 1>result.txt 2>sample-copy-of-trace.xml

    SOURCE XML FILE (sample-copy-of.xml)
    <xsl-perform-sort>
      <node>400</node>
      <node>200</node>
      <node>100</node>
      <node>300</node>
    </xsl-perform-sort>

    TRACE
    <?xml version="1.0" encoding="UTF-8"?>
    <coverage>
       <T type="dot"
          module="..."
          line="9"
          column="0"
          count="3"/>
       <T type="sequence"
          module="..."
          line="8"
          column="38"
          count="1"/>
       <T type="xsl:template"
          module="..."
          line="7"
          column="42"
          count="1"
          match="xsl-perform-sort"/>
       <T type="convert"
          module="..."
          line="15"
          column="4"
          count="1"/>
       <T type="forEach"
          module="..."
          line="8"
          column="38"
          count="1"/>
       <T type="xsl:value-of"
          module="..."
          line="18"
          column="56"
          count="1"/>
    </coverage>

    -->
</xsl:stylesheet>

Is it helpful if I share repro steps for other instances of the problem, or would that be a redundant waste of time?

Actions

#14

Updated by A Galtman 6 months ago

One thing I forgot to mention in my comment 13 above: unlike the prototype histogram listener, the built-in trace listener produces the line/column coordinates I expect. For instance, when I execute

java -cp .;saxon-he-12.5.jar net.sf.saxon.Transform -s:sample-copy-of.xml -xsl:sample-copy-of-column.xsl -T -opt:0 -Tlevel:high

the trace elements for the two <xsl:copy-of> elements refer to line="9" column="40" and line="16" column="11" instead of line="9" column="0" and line="15" column="4", respectively.

Actions

#15

Updated by Adrian Bird 5 months ago

I noticed that the new (in Saxon 12.5) GetLocation() method in the XSLTTraceListener isn't being used. Changing the line in the close() method from

Location loc = traceable.getLocation();

to

Location loc = XSLTTraceListener.getLocation(traceable);

gives the better column numbers in the 2 cases: Line 9 column 40 and Line 16 column 11.

However, I am confused about the line numbers 15 and 18 in the trace above, because I get 16 and 19 using the original version.

Please register to edit this issue

Actions

Send by e-mail Copy link

Also available in: Atom PDF