Suggest of optimization
Added by Anonymous over 16 years ago
Legacy ID: #5193024 Legacy Poster: Vladimir Nesterovsky (vnesterovsky)
Hello Mr Kay, I'd like to suggest an optimization for a specific case. I have no statistics on how often this is used. My humble experience shows that I used it several times. The optimization aims to "compensate" a lack of maps in data model. A formal case There are two stages: 1. Untrivial logic collecting nodes/values satisfying some criteria. 2. Process data, and take a special action whenever a node/value is collected on the previous stage. A specific example I have defined jxom (xml schema describing java syntax). Among other api I have defined functions to remove unreachable code. This is done in two stages, as it cannot be easily implemented in one pass. 1. Collect unreachable statements. 2. If there are such then generate tree without these statements. Just in case java statement reachability is defined at http://java.sun.com/docs/books/jls/second_edition/html/statements.doc.html#236365 To make such logic more efficient an optimization of exists($nodes intersect $node) and (or) exists(index-of($values, $value)) could be done (through an index lookup). I want to believe my description is clear enough. If it's not then never mind... Thanks
Replies (9)
Please register to reply
RE: Suggest of optimization - Added by Anonymous over 16 years ago
Legacy ID: #5193056 Legacy Poster: Michael Kay (mhkay)
Sorry, I wonder if you could give a more specific example of some XSLT or XQuery code that could be optimized better than is currently the case?
RE: Suggest of optimization - Added by Anonymous over 16 years ago
Legacy ID: #5193225 Legacy Poster: Vladimir Nesterovsky (vnesterovsky)
As the following code is oversimplified it looks stupid and it seems that it's inefficient. But please remember that original code is more complex. xml: <unit xmlns="http://www.bphx.com/java-1.5/2008-02-07" package="com.test"> <class access="public" name="MyClass"> <class-method access="protected" name="run"> <block> <switch> <test> <var name="x"> <meta> <type name="int"/> </meta> </var> </test> <case> <value> <int>0</int> </value> <block> <expression> <assign> <var name="y"> <meta> <type name="int"/> </meta> </var> <int>1</int> </assign> </expression> <return/> </block> </case> <case> <block> <return/> <scope> <comment> <para>NOTE: this is unreachable code.</para> <scope> <meta> <bookmark> <warning message="Unreachable code."/> </bookmark> </meta> <expression remove-me="true"> <inc> <var name="y"> <meta> <type name="int"/> </meta> </var> </inc> </expression> </scope> </comment> </scope> </block> </case> </switch> </block> </class-method> </class> </unit> xslt: <!-- This stylesheet optimizes (removes) dead code. $Id: java-optimize-unreacable-code.xslt 3996 2008-07-20 16:18:29Z vladimirn $ --> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:t="http://www.bphx.com/jxom" xmlns:p="http://www.bphx.com/jxom/private/optimizer" xmlns:j="http://www.bphx.com/java-1.5/2008-02-07" xmlns="http://www.bphx.com/java-1.5/2008-02-07" xpath-default-namespace="http://www.bphx.com/java-1.5/2008-02-07" exclude-result-prefixes="xs t j p"> <xsl:output indent="yes"/> <!-- The reachability rules, repeated here, are defined at: http://java.sun.com/docs/books/jls/second_edition/html/statements.doc.html#236365 All smartness is removed for brievity. In short: statement should be reachable; next statement is reachable if previous statement is reachable and completes normally. --> <!-- Entry point. --> <xsl:template match="/unit"> <xsl:variable name="unit" as="element()" select=" t:optimize-unreachable-code ( ., false(), 'NOTE: this is unreachable code.' )"/> <!-- Print warnings: --> <xsl:variable name="bookmarks" as="element()" select="$unit//meta/bookmark"/> <xsl:if test="exists($bookmarks)"> <xsl:message> <xsl:for-each select="$bookmarks"> <message path="..."> <xsl:sequence select="@ | node()"/> </message> </xsl:for-each> </xsl:message> </xsl:if> <xsl:sequence select="$unit"/> </xsl:template> <!-- Collects unreachable statements for a statement. $statement - a scope statement. $statements-not-completing-normally - a statements that do not complete normally. Returns an optional sequence of unreachable statements. --> <xsl:template name="t:collect-unreachable-statements" as="element()"> <xsl:param name="statement" as="element()"/> <xsl:param name="statements-not-completing-normally" tunnel="yes" as="element()"> <xsl:for-each select="$statement//j:"> <xsl:if test="xs:boolean(@remove-me)"> <xsl:sequence select="."/> </xsl:if> </xsl:for-each> </xsl:param> <xsl:variable name="statements" as="element()" select="$statement/j:"/> <xsl:variable name="completes-normally" as="xs:boolean" select=" for $statement in $statements return empty($statement intersect $statements-not-completing-normally)"/> <xsl:variable name="index" as="xs:integer?" select="index-of($completes-normally, false())[1]"/> <xsl:for-each select=" ( if (exists($index)) then subsequence($statements, 1, $index) else $statements )/ ( self::block, self::if/then, self::if/else, self::for/block, self::for-each/block, self::while/block, self::do-while/block, self::try/block, self::try/catch/block, self::try/finally, self::switch/case/block, self::synchronized/block )"> <xsl:call-template name="t:collect-unreachable-statements"> <xsl:with-param name="statement" select="."/> </xsl:call-template> </xsl:for-each> <xsl:if test="exists($index)"> <xsl:sequence select="subsequence($statements, $index)"/> </xsl:if> </xsl:template> <!-- Optimizes unreachable code. $scope - an scope element to check. $remove - true to remove unreachable code, and false to comment it out. $comment - an optional comment text to put near unreachable code. Returns optimized scope element. --> <xsl:function name="t:optimize-unreachable-code" as="element()"> <xsl:param name="scope" as="element()"/> <xsl:param name="remove" as="xs:boolean"/> <xsl:param name="comment" as="xs:string?"/> <xsl:variable name="unreachable-statements" as="element()"> <xsl:for-each select=" $scope//(static/block, (method, class-method, enum-method)/block)"> <xsl:call-template name="t:collect-unreachable-statements"> <xsl:with-param name="statement" select="."/> </xsl:call-template> </xsl:for-each> </xsl:variable> <xsl:choose> <xsl:when test="empty($unreachable-statements)"> <xsl:sequence select="$scope"/> </xsl:when> <xsl:otherwise> <xsl:apply-templates mode="p:optimize-unreachable-code" select="$scope"> <xsl:with-param name="unreachable-statements" tunnel="yes" select="$unreachable-statements"/> <xsl:with-param name="remove" tunnel="yes" select="$remove"/> <xsl:with-param name="comment" tunnel="yes" select="$comment"/> </xsl:apply-templates> </xsl:otherwise> </xsl:choose> </xsl:function> <!-- Mode "p:optimize-unreachable-code". Optimizes unreachable code. --> <xsl:template mode="p:optimize-unreachable-code" match="@ | text() | statement"> <xsl:sequence select="."/> </xsl:template> <!-- Mode "p:optimize-unreachable-code". Optimizes unreachable code. $unreachable-statements - a list of unreachable statements. $remove - true to remove unreachable code, and false to comment it out. $comment - an optional comment text to put near unreachable code. Returns optimized element. --> <xsl:template mode="p:optimize-unreachable-code" match=""> <xsl:param name="unreachable-statements" tunnel="yes" as="element()+"/> <xsl:param name="remove" tunnel="yes" as="xs:boolean"/> <xsl:param name="comment" tunnel="yes" as="xs:string?"/> <xsl:variable name="statement" as="element()" select="."/> <xsl:choose> <xsl:when test="exists(. except $unreachable-statements)"> <xsl:copy> <xsl:sequence select="@"/> <xsl:apply-templates mode="p:optimize-unreachable-code" select="node()"/> </xsl:copy> </xsl:when> <xsl:when test=" not($remove) and empty((self::break, self::continue, self::return))"> <scope> <comment> <xsl:if test="$comment"> <para> <xsl:sequence select="$comment"/> </para> </xsl:if> <scope> <meta> <bookmark> <warning message="Unreachable code."/> </bookmark> </meta> <xsl:sequence select="."/> </scope> </comment> </scope> </xsl:when> </xsl:choose> </xsl:template> </xsl:stylesheet>
RE: Suggest of optimization - Added by Anonymous over 16 years ago
Legacy ID: #5193230 Legacy Poster: Vladimir Nesterovsky (vnesterovsky)
Correction: xml: <unit xmlns="http://www.bphx.com/java-1.5/2008-02-07" package="com.test"> <class access="public" name="MyClass"> <class-method access="protected" name="run"> <block> <switch> <test> <var name="x"> <meta> <type name="int"/> </meta> </var> </test> <case> <value> <int>0</int> </value> <block> <expression> <assign> <var name="y"> <meta> <type name="int"/> </meta> </var> <int>1</int> </assign> </expression> <return/> <break remove-me="true"/> </block> </case> <case> <block> <return/> <expression remove-me="true"> <inc> <var name="y"> <meta> <type name="int"/> </meta> </var> </inc> </expression> <break remove-me="true"/> </block> </case> </switch> </block> </class-method> </class> </unit>
RE: Suggest of optimization - Added by Anonymous over 16 years ago
Legacy ID: #5193296 Legacy Poster: Michael Kay (mhkay)
I think that Saxon's lazy evaluation strategy will handle this case quite efficiently. If I'm wrong, could you please try to explain how it could be improved? Michael Kay Saxonica
RE: Suggest of optimization - Added by Anonymous over 16 years ago
Legacy ID: #5193309 Legacy Poster: Vladimir Nesterovsky (vnesterovsky)
Another thing. If in the preceding xslt I define $statements-not-completing-normally in template t:collect-unreachable-statements like: <xsl:param name="statements-not-completing-normally" tunnel="yes" as="element()"> <xsl:variable name="v" as="element()"> <xsl:for-each select="$statement//j:*"> <xsl:if test="xs:boolean(@remove-me)"> <xsl:sequence select="."/> </xsl:if> </xsl:for-each> </xsl:variable> <xsl:message select="$v"/> <xsl:sequence select="$v"/> </xsl:param> I was expecting this parameter to be evaluated once and be passed further (tunnel="yes"). This way, I was expecting recursive call to t:collect-unreachable-statements to recieve that parameter without evaluation. My expectation was that xsl:message to be reported once. That is not what I see. Am I wrong?
RE: Suggest of optimization - Added by Anonymous over 16 years ago
Legacy ID: #5193330 Legacy Poster: Vladimir Nesterovsky (vnesterovsky)
> I think that Saxon's lazy evaluation strategy will > handle this case quite efficiently. > If I'm wrong, could you please try to explain > how it could be improved? I can see that ". except $unreachable-statements" is compiled as VennExpression, which uses IntersectionEnumeration. In my case, due to size of sequence $unreachable-statements (10-30 on avarage), I guess that index lookup is more efficient than implementation, which I see in IntersectionEnumeration: while (nextNode1 != null && nextNode2 != null) { int c = comparer.compare(nextNode1, nextNode2); if (c<0) { nextNode1 = next(e1); } else if (c>0) { nextNode2 = next(e2); } else { // keys are equal current = nextNode2; // which is the same as nextNode1 nextNode2 = next(e2); nextNode1 = next(e1); position++; return current; } } Thanks.
RE: Suggest of optimization - Added by Anonymous over 16 years ago
Legacy ID: #5193409 Legacy Poster: Vladimir Nesterovsky (vnesterovsky)
Correction. I was talking about expression: $statement intersect $statements-not-completing-normally
RE: Suggest of optimization - Added by Anonymous over 16 years ago
Legacy ID: #5197197 Legacy Poster: Michael Kay (mhkay)
So do you think the following optimization would address your use case: Where the expression $a intersect $b is used in a boolean context, and the expression is evaluated in a loop relative to the initialization of one of $a or $b (call this the outer operand), then store a HashSet index for the outer operand and use this for a fast lookup when evaluating the expression? Currently if $b is the outer operand and $b has size N, while $a has size 1, this operation will typically be O(N). This optimization would change it to be O(log(N)) - plus a setup cost for creating the index. It's a fairly complex optimization to do because it involves new run-time data structures - unless I can contrive to use the existing code for value-based indexing, that is, rewrite it to exists($b[generate-id(.) = generate-id($a)]) and use a value-based index on the generate-id() string. Which would be not quite as efficient, but possibly less special code. Incidentally, the answer to your other question is that if the default value for an xsl:param can be evaluated statically, then this is done, but if it has any dependency on the context then it is evaluated afresh on each call. (If you want to discuss this further, I would suggest using a separate thread).
RE: Suggest of optimization - Added by Anonymous over 16 years ago
Legacy ID: #5199346 Legacy Poster: Vladimir Nesterovsky (vnesterovsky)
My reasoning is like the following: for expressions $item except $collection and $item intersect $collection where expressions are used repeatedly for the same $collection and different $item, a "HASH" technique, storing $collection in some index, might be more efficient than, a "MERGE" technique, deriving results from two ordered sequences. If for the implementation of such an index a hash set is used, then the test time approaches to O(1), provided that hash code is "good" enough (plus setup time, which might also exist in "MERGE" techique to order input). The similar case is with expression index-of($collection, $item) > Incidentally, the answer to your other question is ... I'm satisfied with the answer received at xsl-list, however the decision as for default tunnel parameters seems nonintuitive to me. Thanks
Please register to reply