Project

Profile

Help

Bug #1859

closed

SXLM0001: Too many nested apply-templates calls. The stylesheet may be looping.

Added by d d over 10 years ago. Updated about 10 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Performance
Sprint/Milestone:
-
Start date:
2013-08-06
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

Good afternoon,

Michael Kay suggested I post this issue here for further work.

I have some delimited text (double pipe and a colon separates node name and data) that needs to be converted to xml.

I thought I could do it with XSLT and it worked for a while, but now management wants peoples background and experience included so there's a lot more data to process.

And that's what brings us here to the looping error.

Attached is the template and sample data to convert. Host names and email addresses have been redacted. If I missed one please let me know or delete it if you can.

The data looks like this:

||phone:9999||email:xxxx|| ... etc

And the final result should be

9999

xxxx

...


Files

joedata.xml (137 KB) joedata.xml d d, 2013-08-06 20:23
dbl-pipe.xsl (935 Bytes) dbl-pipe.xsl d d, 2013-08-06 20:23

Related issues

Is duplicate of Saxon - Bug #1991: recursion error, or stack overflow, on match() or analyze-stringClosedMichael Kay2014-01-28

Actions
Actions #1

Updated by Michael Kay over 10 years ago

  • Category set to Performance
  • Assignee set to Michael Kay

Thanks for the sample. It runs to completion for me (in about 500ms) producing output that starts:

<?xml version="1.0" encoding="UTF-8"?>
<response>
  <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">0</int>
      <lst name="params">
         <str name="wt">xml</str>
         <str name="q">joe</str>
      </lst>
  </lst>
  <result name="response" numFound="3" start="0">
      <doc>
         <arr name="pubdateiso">
            <date>2013-08-02T00:00:00Z</date>
         </arr>
         <str name="source">
      https://llsearchdev:9443/C3ProfileFeedDEV/</str>
         <str name="category">profile</str>
         <str name="id">

But I'll investigate a bit further to see if anything obviously untoward is going on.

Actions #2

Updated by d d over 10 years ago

  • Assignee deleted (Michael Kay)

It doesn't work for me either at the command line or from a java app.

I'm using saxon9he v5.0.2J

Command line: java -jar saxon9he.jar -s:joedata.xml -xsl:dbl-pipe.xsl -o:newoutput1.xml

How did you get it to work?

Actions #3

Updated by d d over 10 years ago

  • Assignee set to Michael Kay

Fixing the assignee

Actions #4

Updated by Michael Kay over 10 years ago

It works for me both from the IDE (IntelliJ) and from the command line, both with 9.5.0.2 and 9.5.1.1, using exactly the command line you showed. What JVM are you using?

Actions #5

Updated by Michael Kay over 10 years ago

It fails if I change the XSLT code from match="str/@type" to match="str". The template rule as written wasn't matching anything in your input.

Actions #6

Updated by Michael Kay over 10 years ago

I can confirm the excessively-deep recursion is within the regex engine. As a first step, I'll try and improve the diagnostics.

Actions #7

Updated by d d over 10 years ago

I think my jvm is 1.7 b3

Actions #8

Updated by Michael Kay over 10 years ago

The basic problem here is that with the regular expression

\|((\|[^|]+\|)+)\|

on encountering a vertical bar immediately after another vertical bar, there are two paths that the regex can take (it can loop round the outer loop, or it can take the exit path). It therefore has to be prepared to backtrack, and in order to backtrack, it maintains its state on the stack. So each character that is consumed ends up using another stack frame.

I may not have your logic exactly right, but I believe you can achieve the required effect something like this:


<xsl:template match="str">
  <xsl:copy>
    <xsl:variable name="tokens" select="tokenize(., '\|\|')"/>
    <xsl:value-of select="$tokens[1]"/>
    <xsl:for-each select="subsequence($tokens,2)">     
        <xsl:analyze-string select="." regex="(\w+):(.+)">
          <xsl:matching-substring>
            <xsl:element name="{regex-group(1)}">
              <xsl:value-of select="regex-group(2)"/>
            </xsl:element>
          </xsl:matching-substring>
          <xsl:non-matching-substring>
             <xsl:value-of select="."/>
          </xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:for-each>
  </xsl:copy>
</xsl:template>

There are of course optimizations possible in the regex engine that could improve the situation here (the main one being to move to a non-backtracking algorithm); but most regex engines will have problems with this case.

Actions #9

Updated by d d over 10 years ago

That mostly works. Unfortunately, it doesn't handle the background info correctly.

It only takes the first line of the background.

@<p dir="ltr"><span

... rest of the background data

@

Actions #10

Updated by Michael Kay about 10 years ago

  • Status changed from New to Closed
  • Priority changed from Low to Normal

I'm closing this A duplicate of 1991 which discusses the general problem of excessive stack use for backtracking in the 9.5 regex engine.

Please register to edit this issue

Also available in: Atom PDF