Bug #1859
closed
SXLM0001: Too many nested apply-templates calls. The stylesheet may be looping.
Fixed in Maintenance Release:
Description
Good afternoon,
Michael Kay suggested I post this issue here for further work.
I have some delimited text (double pipe and a colon separates node name and data) that needs to be converted to xml.
I thought I could do it with XSLT and it worked for a while, but now management wants peoples background and experience included so there's a lot more data to process.
And that's what brings us here to the looping error.
Attached is the template and sample data to convert. Host names and email addresses have been redacted. If I missed one please let me know or delete it if you can.
The data looks like this:
||phone:9999||email:xxxx|| ... etc
And the final result should be
9999
xxxx
...
Files
- Category set to Performance
- Assignee set to Michael Kay
Thanks for the sample. It runs to completion for me (in about 500ms) producing output that starts:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="wt">xml</str>
<str name="q">joe</str>
</lst>
</lst>
<result name="response" numFound="3" start="0">
<doc>
<arr name="pubdateiso">
<date>2013-08-02T00:00:00Z</date>
</arr>
<str name="source">
https://llsearchdev:9443/C3ProfileFeedDEV/</str>
<str name="category">profile</str>
<str name="id">
But I'll investigate a bit further to see if anything obviously untoward is going on.
- Assignee deleted (
Michael Kay)
It doesn't work for me either at the command line or from a java app.
I'm using saxon9he v5.0.2J
Command line: java -jar saxon9he.jar -s:joedata.xml -xsl:dbl-pipe.xsl -o:newoutput1.xml
How did you get it to work?
- Assignee set to Michael Kay
It works for me both from the IDE (IntelliJ) and from the command line, both with 9.5.0.2 and 9.5.1.1, using exactly the command line you showed. What JVM are you using?
It fails if I change the XSLT code from match="str/@type" to match="str". The template rule as written wasn't matching anything in your input.
I can confirm the excessively-deep recursion is within the regex engine. As a first step, I'll try and improve the diagnostics.
The basic problem here is that with the regular expression
\|((\|[^|]+\|)+)\|
on encountering a vertical bar immediately after another vertical bar, there are two paths that the regex can take (it can loop round the outer loop, or it can take the exit path). It therefore has to be prepared to backtrack, and in order to backtrack, it maintains its state on the stack. So each character that is consumed ends up using another stack frame.
I may not have your logic exactly right, but I believe you can achieve the required effect something like this:
<xsl:template match="str">
<xsl:copy>
<xsl:variable name="tokens" select="tokenize(., '\|\|')"/>
<xsl:value-of select="$tokens[1]"/>
<xsl:for-each select="subsequence($tokens,2)">
<xsl:analyze-string select="." regex="(\w+):(.+)">
<xsl:matching-substring>
<xsl:element name="{regex-group(1)}">
<xsl:value-of select="regex-group(2)"/>
</xsl:element>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:for-each>
</xsl:copy>
</xsl:template>
There are of course optimizations possible in the regex engine that could improve the situation here (the main one being to move to a non-backtracking algorithm); but most regex engines will have problems with this case.
That mostly works. Unfortunately, it doesn't handle the background info correctly.
It only takes the first line of the background.
@<p dir="ltr"><span
... rest of the background data
@
- Status changed from New to Closed
- Priority changed from Low to Normal
I'm closing this A duplicate of 1991 which discusses the general problem of excessive stack use for backtracking in the 9.5 regex engine.
Please register to edit this issue
Also available in: Atom
PDF