Project

Profile

Help

Problem with TINY_TREE_CONDENSED

Added by Anonymous over 14 years ago

Legacy ID: #8007610 Legacy Poster: Vladimir Nesterovsky (vnesterovsky)

Hello Mr. Kay! Whenever I'm triying to use TINY_TREE_CONDENSED tree model, I'm getting exceptions like this: [code] java.lang.ArrayIndexOutOfBoundsException: -1 at net.sf.saxon.tinytree.SiblingEnumeration.next(SiblingEnumeration.java:122) at net.sf.saxon.instruct.ApplyTemplates.applyTemplates(ApplyTemplates.java:322) at net.sf.saxon.instruct.ApplyTemplates.apply(ApplyTemplates.java:210) at net.sf.saxon.instruct.ApplyTemplates.processLeavingTail(ApplyTemplates.java:174) at net.sf.saxon.instruct.Block.processLeavingTail(Block.java:619) at net.sf.saxon.instruct.Instruction.process(Instruction.java:93) at net.sf.saxon.expr.LetExpression.process(LetExpression.java:453) at net.sf.saxon.instruct.ForEach.processLeavingTail(ForEach.java:331) at net.sf.saxon.instruct.Instruction.process(Instruction.java:93) at net.sf.saxon.instruct.UserFunction.process(UserFunction.java:345) ... [/code] Whenever I'm backing off to the default tree model, all starts working properly. I'll try to prepare a simple test case. By the way, this technique would probably work the best when it were applied to rather short strings (theshold value?). This way you would cache "enum" values, and would pass plain text as is (anyway it's usually unique).


Replies (8)

Please register to reply

RE: Problem with TINY_TREE_CONDENSED - Added by Anonymous over 14 years ago

Legacy ID: #8007622 Legacy Poster: Michael Kay (mhkay)

Thanks for reporting it, but I won't be able to do much with it without a repro. The suggestion to have a threshold length, and only to share storage for strings below this length, looks like a very sensible one.

RE: Problem with TINY_TREE_CONDENSED - Added by Anonymous over 14 years ago

Legacy ID: #8008149 Legacy Poster: Vladimir Nesterovsky (vnesterovsky)

[code] <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:t="this" exclude-result-prefixes="xs t"> <xsl:template match="/"> <xsl:variable name="unit" as="element()*"> <xsl:variable name="d" as="element()"> t t </xsl:variable> <xsl:sequence select="t:f($d)"/> </xsl:variable> <xsl:apply-templates select="$unit/c/c"/> </xsl:template> <xsl:function name="t:f" as="element()"> <xsl:param name="e" as="element()"/> <xsl:sequence select="$e"/> </xsl:function> </xsl:stylesheet> [/code]

RE: Problem with TINY_TREE_CONDENSED - Added by Anonymous over 14 years ago

Legacy ID: #8008383 Legacy Poster: Michael Kay (mhkay)

I can't reproduce this problem using this stylesheet. Could you tell me how you are running it? (Saxon version, command line / API, etc)

RE: Problem with TINY_TREE_CONDENSED - Added by Anonymous over 14 years ago

Legacy ID: #8008523 Legacy Poster: Vladimir Nesterovsky (vnesterovsky)

  1. I'm using java 6, and running against build of https://saxon.svn.sourceforge.net/svnroot/saxon/latest9.2, revision 476. 2. The following program throws ArrayIndexOutOfBoundsException for me: [code] import java.io.File; import javax.xml.transform.Source; import javax.xml.transform.Templates; import javax.xml.transform.Transformer; import javax.xml.transform.stream.StreamResult; import javax.xml.transform.stream.StreamSource; import net.sf.saxon.Configuration; import net.sf.saxon.TransformerFactoryImpl; public class XsltTransformer { public static void main(String[] args) throws Exception { TransformerFactoryImpl transformerFactory = new TransformerFactoryImpl(); Configuration configuration = transformerFactory.getConfiguration(); configuration.setTreeModel( net.sf.saxon.om.TreeModel.TINY_TREE_CONDENSED.getSymbolicValue()); Source source = new StreamSource(new File("C:/temp/test.xslt")); Templates templates = transformerFactory.newTemplates(source); Transformer transformer = templates.newTransformer(); transformer.transform( new StreamSource(new File("C:/temp/in.xml")), new StreamResult(new File("C:/temp/out.xml"))); } } [/code] 3. test.xslt is exactly like one quoted in this thread. 4. in.xml is: [code] [/code]

RE: Problem with TINY_TREE_CONDENSED - Added by Anonymous over 14 years ago

Legacy ID: #8008669 Legacy Poster: Michael Kay (mhkay)

Thanks. Now reproduced. Strong suspicions must fall on the patch for this bug: https://sourceforge.net/tracker/?func=detail&aid=2925771&group_id=29872&atid=397617 but I haven't yet identified what's wrong with it.

RE: Problem with TINY_TREE_CONDENSED - Added by Anonymous over 14 years ago

Legacy ID: #8008769 Legacy Poster: Michael Kay (mhkay)

Oh dear, a complicated story, and I still haven't quite got to the bottom of it! The patch for 2925771 isn't wrong, but it seems to have opened the door to another bug - or more strictly two other bugs - which were always there, but which seemed to cancel each other out in all previous test cases. The first bug is benign: when the code for the condensed tiny tree is comparing a new text node to see if it equals any previous text node, it is using the equals() method, which notoriously is not guaranteed to return true when comparing a String to a CharSequence that is not a String. The effect of this is that the text value isn't shared when it should be, but this has no adverse consequences. This bug is influenced by the patch to 2925711, which causes a different kind of CharSequence to be used. The second bug is more nasty: it means that when a text node is found to be equal to a previous text node, the "next" pointer for that node in the TinyTree is left set to -1, instead of pointing to the parent. This will cause any navigation of the children of the parent of the text node to crash. (However, getting the string value of this parent will not hit this condition.) This bug has been in the code all the time, and it's hard to see why it wasn't caught before, other than the previous bug causing reuse of text nodes to not always work. I'll produce patches for both, probably lumping them together in a single bug entry.

RE: Problem with TINY_TREE_CONDENSED - Added by Anonymous over 14 years ago

Legacy ID: #8008850 Legacy Poster: Michael Kay (mhkay)

Please see https://sourceforge.net/tracker/?func=detail&aid=2934589&group_id=29872&atid=397617

RE: Problem with TINY_TREE_CONDENSED - Added by Anonymous over 14 years ago

Legacy ID: #8009317 Legacy Poster: Vladimir Nesterovsky (vnesterovsky)

It's good to know that our code base helps to make other code better! Can you consider caching values for attributes but not with intern() function? The cause is in our runtime load. We're running transformations in a cycle. Our documents contain many ID attributes, which often happens to be unique on a set wider than single document. This way, over time, memory is wasted for interned strings that are never used.

    (1-8/8)

    Please register to reply