Project

Profile

Help

Bug #4342

Sporadic spikes in XSL processing time due to internal looping

Added by Stefan Fritz 12 months ago. Updated 10 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Diagnostics
Sprint/Milestone:
-
Start date:
2019-10-16
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
9.9, trunk
Fix Committed on Branch:
9.9, trunk
Fixed in Maintenance Release:

Description

Stylesheet sporadically takes many minutes vs some seconds to run. Inside the debugger this can be enforced (see steps below).

Given the XSL output warnings like "Too many nested template or function calls. The stylesheet may be looping." I assume that Saxon internally runs into a StackOverflowError. Which was also explained here: https://saxonica.plan.io/boards/3/topics/7287 The problem is that in above discussion the user got an exception and could react. Here the behavior is inconsistent and is causing a business impact if it happens.

The workaround/solution is to increase stack size e.g. -Xss4m

The reason I log this is that I think Saxon should behave consistently and not start looping internally. Such issues are hard to catch in a production environment.

Steps to reproduce (e.g run inside Eclipse):

class: com.aurea.support.SaxonXSLCmdLine

arguments: perftests\ebixml-CUBE-DC_Dev_200.xml perftests\EBI\XSLs\Relationship_EBI.xsl out.xml param1=123|param2=444

JVM arguments: -Xss1m -Dxsl.useTracing=false

Running the XSL takes e.g. 4000ms

Now run again but set a breakpoint:

net.sf.saxon.expr.insturct.Instruction /* / public static ParameterSet assembleParams(XPathContext context, WithParam[] actualParams) / / throws XPathException / / { / 207 / if ((actualParams == null) || (actualParams.length == 0)) { / 208 / return null; / / } / 210 / ParameterSet params = new ParameterSet(actualParams.length); / 211 / for (WithParam actualParam : actualParams) { / 212 / params.put(actualParam.getVariableQName(), actualParam / 213 / .getSelectValue(context), actualParam / 214 / .isTypeChecked()); / / } / 216 / return params; //<<--- BREAKPOINT / */ }

  • advance ca. 10 times in Debugger to hit it that often
  • then disable the breakpoint and let it continue.
  • It seems to loop.

Sometimes it gets in the loop also without the debugger. Processing then took 3116604 ms instead of 4000 ms.

Without optimizer it takes ca. 12 sec. but disabling this is not an option as many other XSLs would be impacted.

Attached is a video that shows the issue. The reproducible can be found in the ZIP of https://saxonica.plan.io/issues/4340 (please do not attached the customer data here)

zoom_0.mp4 (8.46 MB) zoom_0.mp4 Stefan Fritz, 2019-10-16 12:25

History

#1 Updated by Michael Kay 11 months ago

My first instinct would be to put this down to unpredictability of garbage collection activity. Certainly I would want to see some indication that GC activity (and memory pressure generally) is being monitored. But I'll get something running myself before coming to a conclusion.

Running with debugging enabled in my experience often causes performance to become unpredictable. I've always assumed that's because it increases the memory requirement, though I have no positive evidence for this.

I use IntelliJ rather than Eclipse, but I should think the effects are very similar.

#2 Updated by Stefan Fritz 11 months ago

We could reproduce this independent of the debugger. Further if it is GC and I only pause the processing thread, then I would assume the GC works on the background threads and once I continue the thread it should finish in 4s. Which it does not.

#3 Updated by Michael Kay 11 months ago

I have not been able to reproduce this; it runs fine for me with or without debugging enabled.

However, I have been running under IntelliJ rather than Eclipse. Our normal support policy is that we don't create custom software environments to investigate problems.

It feels very much like a problem with the environment in which you are running Saxon, rather than a problem with Saxon itself. I would certainly suspect that it is memory related, both (a) because running under a debugger tends to increase the memory requirement, and (b) because you report that allocating more memory solves the problem.

#4 Updated by Michael Kay 11 months ago

You mention the warning message "Too many nested template or function calls. The stylesheet may be looping." I haven't seen that message either in my own runs, or in your video, so I'm not sure of its relevance.

What I do see in the video, and it seems to be only in the failing runs, is a message suggesting that an error occurred while matching a pattern. Now of course, with a StackOverflowError it's going to be a bit random when you hit the error, because memory allocation is going to be a bit unpredictable. It seems possible that you're hitting a StackOverflow error while doing pattern matching. The XSLT spec requires that errors that occur during pattern matching are ignored (the pattern is treated as not matching), so it's possible that what's happing here is that Saxon tries to recover from the error, but because an error during pattern matching is treated as a non-match, this could take your stylesheet code into a path where it fires different template rules, which could cause execution to follow a completely different path from then on.

The intent of the rule about ignoring errors in pattern matching is to handle cases where you write a pattern like `match="*[@id eq 3]" and it fails because the id attribute of that particular element isn't an integer. Treating that as a non-match makes sense. Ignoring a problem that arose through not having enough memory doesn't make much sense, so we should probably handle that one differently.

#5 Updated by Michael Kay 11 months ago

  • Category set to Diagnostics
  • Status changed from New to In Progress
  • Assignee set to Michael Kay
  • Applies to branch 9.9, trunk added
  • Fix Committed on Branch 9.9, trunk added

I have committed a patch which hopefully fixes this issue by making the exception that results from StackOverflow non-recoverable when it occurs during pattern matching.

#6 Updated by Michael Kay 11 months ago

  • Status changed from In Progress to Resolved

Marking as resolved; without a repro we can't be certain that my hypothesis for the cause was correct, but it seems a very plausible guess. In particular, it's probably a little random whether the stack overflow is going to occur during pattern matching or at some other time, and the likely effect in the two cases is very different because of the rule that errors in pattern matching are suppressed.

#7 Updated by O'Neil Delpratt 10 months ago

  • Status changed from Resolved to Closed
  • Fixed in Maintenance Release 9.9.1.6 added

Patch committed to the Saxon 9.9.1.6 maintenance release.

Please register to edit this issue

Also available in: Atom PDF